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THERMUS SCOTODUCTUS NUCLEIC ACID POLYMERASES 

5 This application relates to U.S. Provisional Application No. 60/322,218, 

filed September 14, 2001 and U.S. Provisional Application No. 60/334489, filed 
November 30, 2001. 

FIELD OF THE INVENTION 

1 0 The invention relates to nucleic acids and polypeptides for nucleic acid 

polymerases from thermophilic strains of Thermus scotoductus. 

BACKGROUND OF THE INVENTION 

DNA polymerases are naturally-occurring intracellular enzymes used by 

1 5 a cell for replicating DNA by reading one nucleic acid strand and manufacturing 
its complement. Enzymes having DNA polymerase activity catalyze the 
formation of a bond between the 3' hydroxyl group at the growing end of a 
nucleic acid primer and the 5 f phosphate group of a newly added nucleotide 
triphosphate. Nucleotide triphosphates used for DNA synthesis are usually 

20 deoxyadenosine triphosphate (A), deoxythymidine triphosphate (T), 

deoxycytosine triphosphate (C) and deoxyguanosine triphosphate (G), but 
modified or altered versions of these nucleotides can also be used. The order in 
which the nucleotides are added is dictated by hydrogen-bond formation 
between A and T nucleotide bases and between G and C nucleotide bases. 

25 Bacterial cells contain three types of DNA polymerases, termed 

polymerase I, II and III. DNA polymerase I is the most abundant polymerase 
and is generally responsible for certain types of DNA repair, including a repair- 
like reaction that permits the joining of Okazaki fragments during DNA 
replication. Polymerase I is essential for the repair of DNA damage induced by 

30 UV irradiation and radiomimetic drugs. DNA Polymerase II is thought to play a 
role in repairing DNA damage that induces the SOS response. In mutants that 
lack both polymerase I and HI, polymerase II repairs UV-induced lesions. 
Polymerase I and 13 are monomelic polymerases while polymerase HI is a 
multisubunit complex. 

35 Enzymes having DNA polymerase activity are often used in vitro for a 

variety of biochemical applications including cDNA synthesis and DNA 
sequencing reactions. See Sambrook e al., Molecular Cloning: A Laboratory 
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Manual (3rd ed. Cold Spring Harbor Laboratory Press, 2001, hereby 
incorporated by reference. DNA polymerases are also used for amplification of 
nucleic acids by methods such as the polymerase chain reaction (PCR) (Mullis et 
al., U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,800,159, incorporated by 
5 reference) and RNA transcription-mediated amplification methods (e.g., Kacian 
et al., PCT Publication No. WO91/01384, incorporated by reference). 

DNA amplification utilizes cycles of primer extension through the use of 
a DNA polymerase activity, followed by thermal denaturation of the resulting 
double-stranded nucleic acid in order to provide a new template for another 
10 round of primer annealing and extension. Because the high temperatures 

necessary for strand denaturation result in the irreversible inactivations of many 
DNA polymerases, the discovery and use of DNA polymerases able to remain 
active at temperatures above about 37 °C provides an advantage in cost and labor 
efficiency. 

1 5 Thermostable DNA polymerases have been discovered in a number of 

thermophilic organisms including Thermus aquaticus, Thermus thermophilics, 
and species within the genera the Bacillus, Thermococcus, Sulfobus, and 
Pyrococcus. A full length thermostable DNA polymerase derived from Thermus 
aquaticus (Taq) hias been described by Lawyer, et al., J. Biol. Chem. 264:6427- 

20 6437 (1 989) and Gelfand et al, U.S. Pat. No. 5,079,352. The cloning and 

expression of truncated versions of that DNA polymerase are further described 
in Lawyer et al., in PCR Methods and Applications, 2:275-287 (1993), and 
Barnes, PCT Publication No. WO92/06188 (1992). Sullivan reports the cloning 
of a mutated version of the Taq DNA polymerase in EPO Publication No. 

25 0482714A1 (1992). A DNA polymerase from Thermus thermophilus has also 
been cloned and expressed. Asakura et al., J. Ferment. Bioeng. (Japan), 74:265- 
269 (1993). However, the properties of the various polymerases vary. 
Accordingly, new polymerases are needed that have improved sequence 
discrimination, better salt tolerance, combined reverse transcription and DNA 

30 polymerase activities, varying degrees of thermostability, improved tolerance for 
labeled or dideoxy nucleotides and other valuable properties. 

SUMMARY OF THE INVENTION 

The invention provides nucleic acid polymerase enzymes isolated from a 
35 thermophilic organism, Thermus scotoductus. The invention provides nucleic 
acid polymerases from several Thermus scotoductus strains including strain X-l 
(ATCC Deposit No. 27978), strain SM3 and strain Vi7a. 
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In one embodiment, the invention provides an isolated nucleic acid 
encoding a Thermus scotoductus nucleic acid polymerase. Such a nucleic acid 
can have a polynucleotide sequence comprising any one of SEQ ID NO: 1-1 2. 
Nucleic acids complementary to any one of SEQ ID NO: 1-12 are also included 
5 within the invention. In another embodiment, the invention provides an isolated 
nucleic acid encoding a polypeptide having at least 93% identity to an amino 
acid sequence comprising any one of SEQ ID NO: 13-28. The invention also 
provides vectors comprising these isolated nucleic acids, including expression 
vectors comprising a promoter operably linked to any of the isolated nucleic 

1 0 acids of the invention. Host cells comprising such isolated nucleic acids and 
vectors are also provided by the invention, particularly host cells capable of 
expressing a thermostable polypeptide, where the polypeptide has nucleic acid 
polymerase or DNA polymerase activity. 

In another embodiment, the invention provides an isolated nucleic acid 

1 5 encoding a derivative nucleic acid polymerase comprising any one of amino acid 
sequences SEQ ID NO:13-16 having a mutation that decreases 5-3' exonuclease 
activity. Such a derivative nucleic acid polymerase has decreased 5-3 f 
exonuclease activity relative to a nucleic acid polymerase comprising any one of 
amino acid sequences SEQ ID NO:13-16. 

20 In another embodiment, the invention provides an isolated nucleic acid 

encoding a derivative nucleic acid polymerase comprising any one of amino acid 
sequences SEQ ID NO: 13- 16 having a mutation that reduces discrimination 
against dideoxynucleotide triphosphates. Such a derivative nucleic acid 
polymerase has reduced discrimination against dideoxynucleotide triphosphates 

25 relative to a nucleic acid polymerase comprising any one of amino acid 
sequences SEQ ID NO: 13-16. 

The invention also provides isolated polypeptides that can include an 
amino acid sequence with at least 93% identity to any one of SEQ ID NO: 13-28. 
The isolated polypeptides provided by the invention preferably have an amino 

30 acid sequence with at least 95% sequence identity to any one of SEQ ID NO: 1 3- 
28. Such polypeptides can also have nucleic acid polymerase or DNA 
polymerase activity. Such DNA polymerase activity can, for example, be about 
50,000 U/mg protein to about 500,000 U/mg protein. 

The invention further provides a method of synthesizing DNA that 

35 includes contacting a polypeptide comprising any one of SEQ ID NO: 1 3-28 with 
a DNA under conditions sufficient to permit polymerization of DNA. 
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The invention also provides a method of synthesizing DNA from an 
RNA template that includes contacting a polypeptide comprising any one of 
SEQ ID NO: 13-28 with an RNA template under conditions sufficient to permit 
synthesis of DNA (e.g. reverse transcription). The invention further 
5 provides a method for thermocyclic amplification of nucleic acid that comprises 
contacting a nucleic acid with a thermostable polypeptide having any one of 
SEQ ID NO: 13-28 under conditions suitable for amplification of the nucleic 
acid, and amplifying the nucleic acid. Such amplification can be, for example, 
by Strand Displacement Amplification or Polymerase Chain Reaction. 

1 0 The invention also provides a method of primer extending DNA 

comprising contacting a polypeptide comprising of SEQ ID NO: 13-28 with a 
DNA under conditions sufficient to permit polymerization of DNA. Such primer 
extension can be performed, for example, to sequence DNA or to amplify DNA. 
The invention further provides a method of making a nucleic acid 

1 5 polymerase comprising any one of SEQ ID NO: 1 3-28, the method comprising 
incubating a host cell comprising a nucleic acid that encodes a polypeptide 
comprising anyone of SEQ ID NO: 13-28, operably linked to a promoter under 
conditions sufficient for RNA transcription and translation. In one embodiment, 
the method uses a nucleic acid that comprises any one of SEQ ID NO: 1-12. The 

20 invention is also directed to a nucleic acid polymerase or DNA polymerase made 
by this method. 

The invention also provides a kit that includes a container containing a 
nucleic acid polymerase comprising an amino acid sequence with at least 93% 
identity to any one of SEQ ID NO: 13-28. The kit can also contain an unlabeled 

25 nucleotide, a labeled nucleotide, a balanced mixture of nucleotides, a chain 
terminating nucleotide, a nucleotide analog, a buffer solution, a solution 
containing magnesium, a cloning vector, a restriction endonuclease, a 
sequencing primer, a solution containing reverse transcriptase, or a DNA or 
RNA amplification primer. Such kits can, for example, be adapted for 

30 performing DNA sequencing, DNA amplification, RNA amplification, reverse 
transcription or primer extension reactions. 

DESCRIPTION OF THE FIGURES 

Figure 1 provides a comparison of amino acid sequences for polymerases 

35 from Thermus aquaticus (Taq; SEQ ID NO:48), Thermus thermophilic (Tth; 
SEQ ID NO:49), Thermus filiformis (Tfi; SEQ ID NO:50) and strain X-l 
Thermus scotoductus strain X-l (Tsc; SEQ ID NO: 13). 
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Figure 2 provides a comparison of amino acid sequences for three strains 
of Thermus scotoductus polymerases: strain X-l (SEQ ED NO: 13), strain SM3 
(SEQ ID NO: 15), and strain Vi7a (SEQ ID NO: 16). 

DETAILED DESCRIPTION OF THE INVENTION 

5 The present invention relates to nucleic acid and amino acid sequences 

encoding nucleic acid polymerases from thermophilic organisms. In particular, 
the present invention provides a nucleic acid polymerase from Thermus 
scotoductus. The nucleic acid polymerases of the invention can be used in a 
variety of procedures, including DNA synthesis, reverse transcription, DNA 
10 primer extension, DNA sequencing and DNA amplification procedures. 

Definitions 

The term "amino acid sequence" refers to the positional arrangement and 
identity of amino acids in a peptide, polypeptide or protein molecule. Use of the 
15 term "amino acid sequence" is not meant to limit the amino acid sequence to the 
complete, native amino acid sequence of a peptide, polypeptide or protein. 

"Chimeric" is used to indicate that a nucleic acid, such as a vector or a 
gene, is comprised of more than one nucleic acid segment and that at least two 
nucleic acid segments are of distinct origin. Such nucleic acid segments are 
20 fused together by recombinant techniques resulting in a nucleic acid sequence, 
which does not occur naturally. 

The term "coding region" refers to the nucleotide sequence that codes for 
a protein of interest. The coding region of a protein is bounded on the 5 1 side by 
the nucleotide triplet "ATG" that encodes the initiator methionine and on the 3 1 
25 side by one of the three triplets that specify stop codons (i.e., TAA, TAG, TGA). 

"Constitutive expression" refers to expression using a constitutive 
promoter. 

"Constitutive promoter" refers to a promoter that is able to express the 
gene that it controls in all, or nearly all, phases of the life cycle of the cell. 

30 "Complementary" or "complementarity" are used to define the degree of 

base-pairing or hybridization between nucleic acids. For example, as is known 
to one of skill in the art, adenine (A) can form hydrogen bonds or base pair with 
thymine (T) and guanine (G) can form hydrogen bonds or base pair with 
cytosine (C). Hence, A is complementary to T and G is complementary to C. 

35 Complementarity may be complete when all bases in a double-stranded nucleic 
acid are base paired. Alternatively, complementarity may be "partial," in which 
only some of the bases in a nucleic acid are matched according to the base 



pairing rules. The degree of complementarity between nucleic acid strands has 
an effect on the efficiency and strength of hybridization between nucleic acid 
strands. 

The "derivative" of a reference nucleic acid, protein, polypeptide or 
5 peptide, is a nucleic acid, protein, polypeptide or peptide, respectively, with a 
related but different sequence or chemical structure than the respective reference 
nucleic acid, protein, polypeptide or peptide. A derivative nucleic acid, protein, 
polypeptide or peptide is generally made purposefully to enhance or incorporate 
some chemical, physical or functional property that is absent or only weakly 

10 present in the reference nucleic acid, protein, polypeptide or peptide. A 
derivative nucleic acid generally can differ in nucleotide sequence from a 
reference nucleic acid whereas a derivative protein, polypeptide or peptide can 
differ in amino acid sequence from the reference protein, polypeptide or peptide, 
respectively. Such sequence differences can be one or more substitutions, 

15 insertions, additions, deletions, fusions and truncations, which can be present in 
any combination. Differences can be minor (e.g., a difference of one nucleotide 
or amino acid) or more substantial. However, the sequence of the derivative is 
not so different from the reference that one of skill in the art would not recognize 
that the derivative and reference are related in structure and/or function. 

20 Generally, differences are limited so that the reference and the derivative are 

closely similar overall and, in many regions, identical. A 'Variant" differs from 
a '^derivative" nucleic acid, protein, polypeptide or peptide in that the variant can 
have silent structural differences that do not significantly change the chemical, 
physical or functional properties of the reference nucleic acid, protein, 

25 polypeptide or peptide. In contrast, the differences between the reference and 
derivative nucleic acid, protein, polypeptide or peptide are intentional changes 
made to improve one or more chemical, physical or functional properties of the 
reference nucleic acid, protein, polypeptide or peptide. 

The terms "DNA polymerase activity," "synthetic activity" and 

30 "polymerase activity" are used interchangeably and refer to the ability of an 

enzyme to synthesize new DNA strands by the incorporation of deoxynucleoside 
triphosphates. A protein that can direct the synthesis of new DNA strands by the 
incorporation of deoxynucleoside triphosphates in a template-dependent manner 
is said to be "capable of DNA synthetic activity." 

35 The term "5* exonuclease activity" refers to the presence of an activity in 

a protein that is capable of removing nucleotides from the 5' end of a nucleic 
acid. 
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The term "3 1 exonuclease activity" refers to the presence of an activity in 
a protein that is capable of removing nucleotides from the 3* end of a nucleic 
acid. 

"Expression" refers to the transcription and/or translation of an 
5 endogenous or exogeneous gene in an organism. Expression generally refers to 
the transcription and stable accumulation of mRNA. Expression may also refer 
to the production of protein. 

"Expression cassette" means a nucleic acid sequence capable of directing 
expression of a particular nucleotide sequence. Expression cassettes generally 

1 0 comprise a promoter operably linked to the nucleotide sequence to be expressed 
(e.g., a coding region) that is operably linked to termination signals. Expression 
cassettes also typically comprise sequences required for proper translation of the 
nucleotide sequence. The expression cassette comprising the nucleotide 
sequence of interest may be chimeric, meaning that at least one of its 

1 5 r components is heterologous with respect to at least one of its other components. 
The expression of the nucleotide sequence in the expression cassette may be 
under the control of a constitutive promoter or of an inducible promoter that 
initiates transcription only when the host cell is exposed to some particular 
external stimulus. In the case of a multicellular organism, the promoter can also 

20 be specific to a particular tissue or organ or stage of development. 

The term "gene" is used broadly to refer to any segment of nucleic acid 
associated with a biological function. The term "gene" encompasses the coding 
region of a protein, polypeptide, peptide or structural RNA. The term 
"gene"also includes sequences up to a distance of about 2 kb on either end of a 

25 coding region. These sequences are referred to as "flanking" sequences or 
regions (these flanking sequences are located 5 1 or 3' to the non-translated 
sequences present on the mRNA transcript). The 5' flanking region may contain 
regulatory sequences such as promoters and enhancers or other recognition or 
binding sequences for proteins that control or influence the transcription of the 

30 gene. The 3* flanking region may contain sequences that direct the termination 
of transcription, post-transcriptional cleavage and polyadenylation as well as 
recognition sequences for other proteins. A protein or polypeptide encoded in a 
gene can be full length or any portion thereof, so that all activities or functional 
properties are retained, or so that only selected activities (e.g., enzymatic 

35 activity, ligand binding, or signal transduction) of the full-length protein or 

polypeptide are retained. The protein or polypeptide can include any sequences 
necessary for the production of a proprotein or precursor polypeptide. The term 



< * 

I 



"native gene" refers to gene that is naturally present in the genome of an 
untrans formed cell 

"Genome" refers to the complete genetic material that is naturally present 
in an organism and is transmitted from one generation to the next. 
5 The terms "heterologous nucleic acid," or "exogenous nucleic acid" refer 

to a nucleic acid that originates from a source foreign to the particular host cell 
or, if from the same source, is modified from its original form. Thus, a 
heterologous gene in a host cell includes a gene that is endogenous to the 
particular host cell but has been modified through, for example, the use of DNA 

10 shuffling. The terms also include non-naturally occurring multiple copies of a 
naturally occurring nucleic acid. Thus, the terms refer to a nucleic acid segment 
that is foreign or heterologous to the cell, or normally found within the cell but 
in a position within the cell or genome where it is not ordinarily found. 

The term "homology" refers to a degree of similarity between a nucleic 

15 acid and a reference nucleic acid or between a polypeptide and a reference 
polypeptide. Homology may be partial or complete. Complete homology 
indicates that the nucleic acid or amino acid sequences are identical. A partially 
homologous nucleic acid or amino acid sequence is one that is not identical to 
the reference nucleic acid or amino acid sequence. Hence, a partially 

20 homologous nucleic acid has one or more nucleotide differences in its sequence 
relative to the nucleic acid to which it is being compared. The degree of 
homology can be determined by sequence comparison. Alternatively, as is 
understood by those skilled in the art, DNA-DNA or DNA-RNA hybridization, 
under various hybridization conditions, can provide an estimate of the degree of 

25 homology between nucleic acids, {see, e.g., Haines and Higgins (eds.), Nucleic 
Acid Hybridization, IRL Press, Oxford, U.K.). 

"Hybridization" refers to the process of annealing complementary nucleic 
acid strands by forming hydrogen bonds between nucleotide bases on the 
complementary nucleic acid strands. Hybridization, and the strength of the 

30 association between the nucleic acids, is impacted by such factors as the degree 
of complementary between the hybridizing nucleic acids, the stringency of the 
conditions involved, the T m of the formed hybrid, and the G:C ratio within the 
nucleic acids. 

"Inducible promoter" refers to a regulated promoter that can be turned on 
35 in one or more cell types by an external stimulus, such as a chemical, light, 
hormone, stress, temperature or a pathogen. 
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An "initiation site" is region surrounding the position of the first 
nucleotide that is part of the transcribed sequence, which is defined as position 
+1. All nucleotide positions of the gene are numbered by reference to the first 
nucleotide of the transcribed sequence, which resides within the initiation site. 
5 Downstream sequences (i.e., sequences in the 3* direction) are denominated 
positive, while upstream sequences (i.e., sequences in the 5' direction) are 
denominated negative. 

An "isolated" or "purified" nucleic acid or an "isolated" or "purified" 
polypeptide is a nucleic acid or polypeptide that, by the hand of man, exists apart 

10 from its native environment and is therefore not a product of nature. An isolated 
nucleic acid or polypeptide may exist in a purified form or may exist in a non- 
native environment such as, for example, a transgenic host cell. 

The term "invader oligonucleotide" refers to an oligonucleotide that 
contains sequences at its 3* end that are substantially the same as sequences 

1 5 located at the 5' end of a probe oligonucleotide. These regions will compete for 
hybridization to the same segment along a complementary target nucleic acid. 

The term "label" refers to any atom or molecule that can be used to 
provide a detectable (preferably quantifiable) signal, and that can be attached to 
a nucleic acid or protein. Labels may provide signals detectable by fluorescence, 

20 radioactivity, colorimetry, gravimetry, X-ray diffraction or absorption, 
magnetism, enzymatic activity, and the like. 

The term "nucleic acid" refers to deoxyribonucleotides or ribonucleotides 
and polymers thereof in either single- or double-stranded form, composed of 
monomers (nucleotides) containing a sugar, phosphate and a base that is either a 

25 purine or pyrimidine. Unless specifically limited, the term encompasses nucleic 
acids containing known analogs of natural nucleotides that have similar binding 
properties as the reference nucleic acid and are metabolized in a manner similar 
to naturally occurring nucleotides. Unless otherwise indicated, a particular 
nucleic acid sequence also implicitly encompasses conservatively modified 

30 variants thereof (e.g., degenerate codon substitutions) and complementary 
sequences as well as the reference sequence explicitly indicated. 

The term "oligonucleotide" as used herein is defined as a molecule 
comprised of two or more deoxyribonucleotides or ribonucleotides, preferably 
more than three, and usually more than ten. There is no precise upper limit on 

35 the size of an oligonucleotide. However, in general, an oligonucleotide is shorter 
than about 250 nucleotides, preferably shorter than about 200 nucleotides and 
more preferably shorter than about 100 nucleotides. The exact size will depend 



on many factors, which in turn depends on the ultimate function or use of the 
oligonucleotide. The oligonucleotide may be generated in any manner, including 
chemical synthesis, DNA replication, reverse transcription, or a combination 
thereof. 

5 The terms "open reading frame" and "ORF" refer to the amino acid 

sequence encoded between translation initiation and termination codons of a 
coding sequence. The terms "initiation codon" and "termination codon" refer to 
a unit of three adjacent nucleotides ('codon 1 ) in a coding sequence that specifies 
initiation and chain termination, respectively, of protein synthesis (mRNA 

10 translation). 

"Operably linked" means joined as part of the same nucleic acid 
molecule, so that the function of one is affected by the other. In general, 
"operably linked" also means that two or more nucleic acids are suitably 
positioned and oriented so that they can function together. Nucleic acids are 

1 5 often operably linked to permit transcription of a coding region to be initiated 
from the promoter. For example, a regulatory sequence is said to be "operably 
linked to" or "associated with" a nucleic acid sequence that codes for an RNA or 
a polypeptide if the two sequences are situated such that the regulatory sequence 
affects expression of the coding region (i.e., that the coding sequence or 

20 functional RNA is under the transcriptional control of the promoter). Coding 
regions can be operably-linked to regulatory sequences in sense or antisense 
orientation. 

The term "probe oligonucleotide" refers to an oligonucleotide that 
interacts with a target nucleic acid to form a cleavage structure in the presence or 

25 absence of an invader oligonucleotide. When annealed to the target nucleic acid, 
the probe oligonucleotide and target form a cleavage structure and cleavage 
occurs within the probe oligonucleotide. The presence of an invader 
oligonucleotide upstream of the probe oligonucleotide can shift the site of 
cleavage within the probe oligonucleotide (relative to the site of cleavage in the 

30 absence of the invader). 

"Promoter" refers to a nucleotide sequence, usually upstream (5 1 ) to a 
coding region, which controls the expression of the coding region by providing 
the recognition site for RNA polymerase and other factors required for proper 
transcription. "Promoter" includes but is not limited a minimal promoter that is a 

35 short DNA sequence comprised of a TATA- box. Hence, a promoter includes 
other sequences that serve to specify the site of transcription initiation and 
control or regulate expression, for example, enhancers. Accordingly, an 
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"enhancer" is a segment of DNA that can stimulate promoter activity and may be 
an innate element of the promoter or a heterologous element inserted to enhance 
the level or tissue specificity of a promoter. It is capable of operating in both 
orientations (normal or flipped), and is capable of functioning even when moved 
5 either upstream or downstream from the promoter. Promoters may be derived in 
their entirety from a native gene, or be composed of different elements derived 
from different promoters found in nature, or even be comprised of synthetic 
DNA segments. A promoter may also contain DNA segments that are involved 
in the binding of protein factors that control the effectiveness of transcription 

1 0 initiation in response to physiological or developmental conditions. 

The terms "protein," "peptide" and "polypeptide" are used 
interchangeably herein. 

"Regulatory sequences" and "regulatory elements" refer to nucleotide 
sequences that control some aspect of the expression of nucleic acid sequences. 

15 Such sequences or elements can be located upstream (5' non-coding sequences), 
within, or downstream (3' non-coding sequences) of a coding sequence. 
"Regulatory sequences" and "regulatory elements" influence the transcription, 
RNA processing or stability, or translation of the associated coding sequence. 
Regulatory sequences include enhancers, introns, promoters, polyadenylation 

20 signal sequences, splicing signals, termination signals, and translation leader 
sequences. They include natural and synthetic sequences. 

As used herein, the term "selectable marker" refers to a gene that encodes 
an observable or selectable trait that is expressed and can be detected in an 
organism having that gene. Selectable markers are often linked to a nucleic acid 

25 of interest that may not encode an observable trait, in order to trace or select the 
presence of the nucleic acid of interest. Any selectable marker known to one of 
skill in the art can be used with the nucleic acids of the invention. Some 
selectable markers allow the host to survive under circumstances where, without 
the marker, the host would otherwise die. Examples of selectable markers 

30 include antibiotic resistance, for example, tetracycline or ampicillin resistance. 

As used herein the term "stringency" is used to define the conditions of 
temperature, ionic strength, and the presence of other compounds such as 
organic solvents, under which nucleic acid hybridizations are conducted. With 
"high stringency" conditions, nucleic acid base pairing will occur only between 

35 nucleic acids that have a high frequency of complementary base sequences. With 
"weak" or "low" stringency conditions nucleic acids the frequency of 
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complementary sequences is usually less, so that nucleic acids with differing 
sequences can be detected and/or isolated. 

The terms "substantially similar" and "substantially homologous" refer to 
nucleotide and amino acid sequences that represent functional equivalents of the 
5 instant inventive sequences. For example, altered nucleotide sequences that 
simply reflect the degeneracy of the genetic code but nonetheless encode amino 
acid sequences that are identical to the inventive amino acid sequences are 
substantially similar to the inventive sequences. In addition, amino acid 
sequences that are substantially similar to the instant sequences are those 
10 wherein overall amino acid identity is sufficient to provide an active, thermally 
stable nucleic acid polymerase. For example, amino acid sequences that are 
substantially similar to the sequences of the invention are those wherein the 
overall amino acid identity is 80% or greater, preferably 90% or greater, such as 
91 %, 92%, 93%, or 94%, and more preferably 95% or greater, such as 96%, 
15 97%, 98%, or 99% relative to the amino acid sequences of the invention. 

A "terminating agent," 'terminating nucleotide" or "terminator" in 
relation to DNA synthesis or sequencing refers to compounds capable of 
specifically terminating a DNA sequencing reaction at a specific base, such 
compounds include but are not limited to, dideoxynucleosides having a 2\ 3' 
20 dideoxy structure (e.g., ddATP, ddCTP, ddGTP and ddTTP). 

"Thermostable" means that a nucleic acid polymerase remains active at a 
temperature greater than about 37°C. Preferably, the nucleic acid polymerases 
of the invention remain active at a temperature greater than about 42 °C. More 
preferably, the nucleic acid polymerases of the invention remain active at a 
25 temperature greater than about 50 °C. Even more preferably, the nucleic acid 
polymerases of the invention remain active after exposure to a temperature 
greater than about 60 °C. Most preferably, the nucleic acid polymerases of the 
invention remain active despite exposure to a temperature greater than about 70 
°C. 

30 A "transgene" refers to a gene that has been introduced into the genome 

by transformation and is stably maintained. Transgenes may include, for 
example, genes that are either heterologous or homologous to the genes of a 
particular organism to be transformed. Additionally, transgenes may comprise 
native genes inserted into a non-native organism, or chimeric genes. The term 

35 "endogenous gene" refers to a native gene in its natural location in the genome 
of an organism. A "foreign" or "exogenous" gene refers to a gene not normally 
found in the host organism but one that is introduced by gene transfer. 
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The term "transformation" refers to the transfer of a nucleic acid 
fragment into the genome of a host cell, resulting in genetically stable 
inheritance. Host cells containing the transformed nucleic acid fragments are 
referred to as "transgenic" cells, and organisms comprising transgenic cells are 
5 referred to as "transgenic organisms." Transformation may be accomplished by 
a variety of means known to the art including calcium DNA co-precipitation, 
electroporation, viral infection, and the like. 

The "variant" of a reference nucleic acid, protein, polypeptide or peptide, 
is a nucleic acid, protein, polypeptide or peptide, respectively, with a related but 

10 different sequence than the respective reference nucleic acid, protein, 

polypeptide or peptide. The differences between variant and reference nucleic 
acids, proteins, polypeptides or peptides are silent or conservative differences. 
A variant nucleic acid differs in nucleotide sequence from a reference nucleic 
acid whereas a variant nucleic acid, protein, polypeptide or peptide differs in 

1 5 amino acid sequence from the reference protein, polypeptide or peptide, 
respectively. A variant and reference nucleic acid, protein, polypeptide or 
peptide may differ in sequence by one or more substitutions, insertions, 
additions, deletions, fusions and truncations, which may be present in any 
combination. Differences can be minor (e.g., a difference of one nucleotide or 

20 amino acid) or more substantial. However, the structure and function of the 

variant is not so different from the reference that one of skill in the art would not 
recognize that the variant and reference are related in structure and/or function. 
Generally, differences are limited so that the reference and the variant are closely 
similar overall and, in many regions, identical. 

25 The term "vector" is used to refer to a nucleic acid that can transfer 

another nucleic acid segment(s) into a cell. A "vector" includes, inter alia, any 
plasmid, cosmid, phage or nucleic acid in double- or single-stranded, linear or 
circular form that may or may not be self transmissible or mobilizable. It can 
transform prokaryotic or eukaryotic host cells either by integration into the 

30 cellular genome or by existing extrachromosomally (e.g., autonomous 

replicating plasmid with an origin of replication). Vectors used in bacterial 
systems often contain an origin of replication that allows the vector to replicate 
independently of the bacterial chromosome. The term "expression vector" refers 
to a vector containing an expression cassette. 

35 The term "wild-type" refers to a gene or gene product that has the 

characteristics of that gene or gene product when isolated from a naturally 
occurring source. A wild-type gene is the gene form most frequently observed 
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in a population and thus arbitrarily is designed the "normal" or "wild-type" form 
of the gene. In contrast, the term 'Variant" or "derivative" refers to a gene or 
gene product that displays modifications in sequence and or functional properties 
(i.e., altered characteristics) when compared to the wild-type gene or gene 
5 product. Naturally-occurring derivatives can be isolated. They are identified by 
the fact that they have altered characteristics when compared to the wild-type 
gene or gene product. 

Polymerase Nucleic Acids 

10 The invention provides isolated nucleic acids encoding Thermus 

scotoductus nucleic acid polymerases as well as derivatives fragments and 
variant nucleic acids thereof that encode an active, thermally stable nucleic acid 
polymerase. Thus, one aspect of the invention includes the nucleic acid 
polymerases encoded by the polynucleotide sequences contained in Thermus 

15 scotoductus strain X-l (ATCC Deposit No. 27978). Another aspect of the 

invention provides the nucleic acid polymerases of Thermus scotoductus strains 
SM3 and Vi7a. Any nucleic acid encoding any one of amino acid sequences 
SEQ ID NO: 13-28, which are amino acid sequences for wild type and several 
derivative Thermus scotoductus nucleic acid polymerases, are also contemplated 

20 by the present invention. 

In one embodiment, the invention provides a nucleic acid of SEQ ID 
NO:l, a wild type Thermus scotoductus, strain X-l, nucleic acid encoding a 
nucleic acid polymerase. 
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ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 




TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCGT 


ACCGTACCTT 


80 




TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 




GTCCAGGCGG 


TGTACGGGTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 




CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 


30 


CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 




TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GACTTTCCCC 


280 




GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 




CCTGGAGCGC 


CTCGAGGTGC 


CGGGCTTTGA 


GGCGGATGAC 


360 




GTCCTGGCTA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 


35 


ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 




GCTTCTTTCG 


GAGCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 




TACCTGATCA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 
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TTAAGCCTTC CCAGTGGGTG 
GGACCCTTCC GACAACATCC 
GAGAAGACGG CGGCCAAGCT 
TGGAAAACCT TCTTAAGCAC 
5 CTCCGTGCGG GAGAAGATCC 
AAGCTATCCC TGGAGCTATC 
CCCTTCAGGT GGACTTGGCC 
GGAGGGGCTT AAGGCCTTTT 
AGCCTCCTCC ACGAGTTCGG 

10 CGGCGGAGGA AGCTCCCTGG 
CGTGGGGTAC GTTCTTTCCC 
GAGCTTAACG CCTTGGCCGC 
ACCGGGCGGA GGATCCCTTG 
GGAGGTGAGG GGGCTTTTGG 

15 GCCCTGAGGG AAGGGATTGC 
CCATGCTCCT CGCCTACCTC 
CCCCGAAGGG GTAGCCCGGC 
GAGGAGGCGG GGGAAAGGGC 
ACGCCGCCCT CCTGGAGCGG 

20 TCTTTGGCTT TACGAGGAGG 
GTCCTGGCCC ACATGGAGGC 
TGGCCTACTT AAAGGCCCTT 
GCTCAGGCGC CTCGAGGAGG 
CATCCTTTCA ACCTGAACTC 

25 TCCTCTTTGA CGAGCTTGGG 
GGAGAAGACG GGCAAGCGCT 
GAGGCCTTGC GGGAGGCTCA 
TTCAGTACCG GGAGCTTTCC 
CGATCCCTTG CCTGCCCTGG 

30 CTCCACACCC GTTTCAACCA 
GGCTTAGCAG CTCGGATCCC 
GCGCACCCCT TTGGGCCAGC 
GCCGAGGAGG GGTGGAGGCT 
AGATTGAGCT CAGGGTCCTG 

35 GAACCTAATC CGGGTCTTCC 
ACCCAGACGG CCAGCTGGAT 
CCGTGGATTC CCTGATGCGT 



GACTACCGGG 


CCTTGGCCGG 


560 


CCGGCGTGAA 


GGGCATCGGG 


600 


GATCCGGGAG 


TGGGGAAGCC 


640 


CTGGAACAGG 


TGAAACCTGC 


680 


TTAGCCACAT 


GGAGGACCTC 


720 


CCGGGTGCGC 


ACGGACTTGC 


760 


CGGCGCCGGG 


AGCCGGACCG 


800 


TGGAGAGGCT 


GGAGTTCGGA 


840 


CCTGTTGGAA 


AGCCCGGTGG 


880 


CCGCCCCCCG 


AGGGAGCCTT 


920 


GCCCCGAGCC 


CATGTGGGCG 


960 


CGCCTGGGAG 


GGAAGGGTTT 


1000 


GAGGCCTTGC 


GGGGGCTTGG 


1040 


CCAAGGACCT 


GGCGGTGCTG 


1080 


CCTGGCACCG 


GGCGACGACC 


1120 


CTGGATCCTT 


CCAACACCGC 


1160 


GCTACGGGGG 


GGAGTGGACC 


1200 


GTTGCTTTCC 


GAAAGGCTTT 


1240 


CTTAAGGGGG 


AGGAGAGGCT 


1280 


TGGAAAAGCC 


CCTTTCGCGG 


1320 


CACGGGGGTA 


CGGTTGGATG 


1360 


TCCCTGGAGG 


TGGAGGCGGA 


1400 


AGGTCCACCG 


CCTGGCCGGG 


1440 


CCGGGACCAG 


CTGGAAAGGG 


1480 


CTTCCCGCCA 


TCGGCAAGAC 


1520 


CCACCAGCGC 


CGCCGTTTTG 


1560 


TCCCATCGTG 


GACCGCATCC 


1600 


AAGCTCAAGG 


GAACCTACAT 


1640 


TCCACCCCAA 


GACGAACCGC 


1680 


GACGGCCACC 


GCCACGGGGA 


1720 


AACCTGCAAA 


ATATCCCCGT 


1760 






X o u u 


GGTGGTTTTG 


GACTACAGCC 


1840 


GCGCACCTTT 


CCGGGGACGA 


1880 


AGGAGGGCCA 


GGACATCCAC 


1920 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


CGGGCGGCCA 


AGACCATCAA 


2000 



15 



CTTCGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2040 


GGAGAGCTGG 


CCATCCCCTA 


CGAGGAGGCG 


GTGGCCTTCA 


2080 


TCGAGCGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TGCGGGCCTG 


2120 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


ACGGGGCTAT 


2160 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAC 


GTGCCCGACT 


2200 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 


TTGATGAAAC 


TGGCCATGGT 


GAAGCTCTTT 


CCCAGGCTTC 


2320 


AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAAGTGGGCA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AG 
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In another embodiment, the invention provides nucleic acids encoding 
wild type nucleic acid polymerase from Thermns scotoductus, strain SM3, 
having, for example, SEQ ED NO:2. 



ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 


TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 


TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 


GTCCAGGCGG 


TGTACGGGTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 


CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 


CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 


TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GACTTTCCCC 


280 


GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 


CCTGGAGCGC 


CTCGAAGTGC 


CGGGTTTTGA 


GGCGGATGAC 


360 


GTCCTGGCCA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 


ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 


GCTTCTTTCG 


GACCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 


TACCTGATCA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 


TTAAGCCTTC 


CCAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 


GGACCCTTCC 


GACAACATCC 


CCGGCGTGAA 


GGGCATCGGG 


600 


GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


64 0 


TGGAAAACCT 


TCTTAAGCAC 


CTGGAACAGG 


TGAAACCTGC 


680 
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CTCCGTGCGG GAGAAGATCC 
AAGCTATCCC TGGAGCTTTC 
CCCTTCAGGT GGACTTCGCC 
GGAAGGGCTT AAGGCCTTTT 
5 AGCCTCCTCC ACGAGTTCGG 
CGGCGGAGGA AGCTCCCTGG 
CGTGGGGTAC GTTCTTTCCC 
GAGCTTAACG CCTTGGCCGC 
ACCGGGCGGA GGATCCCTTG 

10 GGAGGTGAGG GGGCTTTTGG 
GCCCTGAGGG AAGGGATTGC 
CCATGCTCCT CGCCTACCTC 
CCCCGAAGGG GTAGCCCGGC 
GAGGAGGCGG GGGAAAGGGC 

15 ACGCCGCCCT CCTGGAGCGG 
TCTTTGGCTT TACGAGGAGG 
GTCCTGGCCC ACATGGAGGC 
TGGCCTACTT GAAGGCCCTT 
GCTCAGGCGC CTCGAGGAGG 

20 CATCCTTTCA ACCTGAACTC 
TCCTCTTTGA CGAGCTTGGG 
GGAGAAGACG GGTAAGCGTT 
GAGGCTTTGA GGGAGGCTCA 
TCCAGTACCG GGAGCTTTCC 

25 CGATCCCTTG CCCGCCCTGG 
CTCCACACCC GTTTCAACCA 
GGCTTAGCAG CTCGGATCCC 
GCGCACCCCT TTAGGCCAGC 
GCCGAGGAGG GGTGGAGGCT 

30 AGATTGAGCT CAGGGTCCTG 
GAACCTGATC CGGGTCTTCC 
ACCCAGACGG CCAGCTGGAT 
CCGTGGATTC CCTGATGCGC 



TTAGPPAPAT 


GGAGGACCTC 


720 




ACGGAGTTGC 


760 




AGCCGGACCG 


800 


TGGAGAGGPT 


GGAGTTCGGA 


840 


PPTGTTGGAA 


AGCCCGGTGG 


880 


ccaccccccci 


AGGGAGCCTT 


920 


\J V_ V— V_* V— \3-tX\J K~ V- 


PATGTGGGPG 


960 

J U v 




PGA A GGGTTT 


1000 

X \J W \J 




GGGGGPTTGG 


1 O40 

X v_/ *i w 


pp a a nna ppt 


PPPGPJTGPTG 

uyLOO X X w 


j OfiO 




GGPG A PGAPP 


1120 


ptgg a tpptt 


PPAAPAPPGP 


1160 


GPT A PGGGGG 


GG AGTGG AC P 


1200 

X. a V w 


VjV- X VJ7V_ X X X V—V— 


GAAAGGPTTT 


1240 

-L ~ \J 




AGGAGAGGPT 

■H.U \jI\\J£\ O O L. X 


1 2 ft 0 
X ^ o u 


tv^a a a a nrr 


PPTTTP/^PPP 


1 2 0 
j. j ^ \j 


r* a rr i f2r , r i r t, T a 


X oo X X uuri X Vj 


1 JO v 


1 V~<wV_ X LsVji-Vvjvjj 


Tnnannp^a 

x ovj.M.vjVji^ovj.rt. 


1 4 OO 
X *± v 


iv rjriTrr a ppg 


A PTGGPPGGG 


x. r± w 


ppggg a pp zv g 


PTGGAAAGGG 


T 4 SO 
x. "x o vy 




TPGGP A AG AP 


1 R20 


p p zv pp zv ppgp 


pnppnTTTTn 


X -5 O \J 


t p p p a t zv two. 


GAPPGPATPP 


T fiOO 


A A HPTP ZV ZV CLCX 


GAAPGTAPAT 


1 fi40 

X u 


thc* a nncn a a 


GAPGA APPGP 


1 fiftO 

lOOU 


ha pnrippa pp 


GPP A PGGGGA 


1 720 

X. / <C* \J 


a a ccmr* A A A 


ATATPPPPGT 


1 760 


X L. v-Vj V- \- Vj 


GGPPTTPGTG 


1 ftOO 


GGTGGTTTTG 


GACTACAGCC 


1840 


GCGCACCTTT 


CCGGGGACGA 


1880 


AAGAGGGCCA 


GGACATCCAC 


1920 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


CGGGCGGCCA 


AGACCATCAA 


2000 



17 




V* X X VJ ± \— ' 


CTPTACGGOA 


TPTPPGPPPA 


(TGGCTTTr'G 


9 04 0 




PPATPPPPTA 




GTGGPPTTPA 


9 OflO 

A v O \J 


TPG A fJPfifiTA 


TTTPPAGAGP 


TAPPPPAAGG 


TA PGGGPPTG 


91 9 0 

X \J 




A PPPTPnppn 


a appapgpga 






ptppa a a ppp 


T P TT TPP P PP 


PPPPPGPT AT 


PTPPPPP A PT 


9 9 OO 




PPTPA APAPP 


ATPPPPPAPP 




Z Z *t u 




A A P A TPPPPP 




PPP PPPPP A T 


noon 
z z o u 


TTPATPA A AP 




P A aPPTPTTT 


\^\~\~*J±\j\j\*. XXL, 


Z _5 Z U 


AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAC 
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15 In another embodiment, the invention provides nucleic acids encoding a 

wild type nucleic acid polymerase from Thermus scdtoductus, strain Vi7a, 
having, for example, SEQ ED NO:3. 



ATGAGGGCGA TGCTGCCCCT 
20 TGCTTCTGGT GGACGGCCAC 
TTTTGCCCTG AAGGGCCTCA 
GTCCAGGCGG TGTACGGGTT 
CGCTAAGGGA AGACGGGGAT 
CGCCAAGGCC CCCTCCTTCC 
25 TACAAGGCGG GGCGGGCTCC 
GGCAGCTTGC CCTTATCAAG 
CCTGGAGCGC CTCGAAGTGC 
GTCCTGGCCA CCCTGGCCAA 
ACGAGGTGCG CATCCTCACC 
30 GCTTCTTTCG GACCGAATCT 
TACCTGATTA CCCCGGAGTG 
TTAAGCCTTC CCAGTGGGTG 
GGACCCTTCC GACAACATCC 



CTTTGAGCCC 


AAGGGCCGGG 


40 


CACCTGGCCT 


ACCGTACCTT 


80 


CCACCAGCCG 


CGGGGAGCCG 


120 


TGCCAAGAGC 


CTTTTGAAGG 


160 


GTGGTGATCG 


TGGTGTTTGA 


200 


GCCACCAGAC 


CTACGAGGCC 


240 


CACCCCCGAG 


GACTTTCCCC 


280 


GAGATGGTGG 


ACCTTTTGGG 


320 


CGGGTTTTGA 


GGCGGATGAC 


360 


GAAGGCGGAA 


AAGGAAGGCT 


400 


GCGGACCGGG 


ACCTTTACCA 


440 


CCATCCTTCA 


CCCGGAGGGT 


480 


GCTTTGGGAG 


AAGTATGGGC 


520 


GACTACCGGG 


CCTTGGCCGG 


560 


CCGGCGTGAA 


GGGCATCGGG 


600 
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GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


640 




TGGAAAACCT 


TCTTAAGCAC 


CTGGAACAGG 


TGAAACCTGC 


680 




CTCCGTGCGG 


GAGAAGATCC 


TTAGCCACAT 


GGAGGACCTC 


720 




AAGCTATCCC 


TGGAGCTTTC 


CCGGGTGCAC 


ACGGAGTTGC 


760 


5 


CCCTTCAGGT 


GGACTTCGCC 


CGGCGCCGGG 


AGCCGGACCG 


800 




GGAAGGGCTT 


AAGGCCTTTT 


TGGAGAGGCT 


GGAGTTCGGA 


840 




AGCCTCCTCC 


ACGAGTTCGG 


CCTGTTGGAA 


AGCCCGGTGG 


880 




CGGCGGAGGA 


AGCTCCCTGG 


CCGCCCCCCG 


AGGGAGCCTT 


920 




CGTGGGGTAC 


GTTCTTTCCC 


GCCCCGAGCC 


CATGTGGGCG 


960 


10 


GAGCTTAACG 


CCTTGGCCGC 


CGCCTGGGAG 


GGAAGGGTTT 


1000 




ACCGGGCGGA 


GGATCCCTTG 


GAGGCCTTGC 


GGGGGCTTGG 


1040 




GGAGGTGAGG 


GGGCTTTTGG 


CCAAGGACCT 


GGCGGTGCTG 


1080 




GCCCTGAGGG 


AAGGGATTGC 


CCTGGCACCG 


GGCGACGACC 


1120 




CCATGCTCCT 


CGCCTACCTC 


CTGGATCCTT 


CCAACACCGC 


1160 


15 


CCCCGAAGGG 


GTAGCCCGGC 

vj x> x x^ x** x^ vj Vi» 


GCTACGGGGG 

w x» X X W XJ W W W ' 


GGAGTGG AC C 


1200 




GAGGAGGCGG 


GGGAAAGGGC 


GCTGCTTTCC 


GAAAGGCTTT 


1240 




ACGCCGCCCT 


CCTGGAGCGG 


CTTAAGGGGG 


AGGAGAGGCT 


1280 




TCTTTGGCTT 

X V — ■ X X X VjVJn.- X X 


TACGAGGAGG 


TGGAAAAGCC 


CCTTTCGCGG 


1320 




GTCCTGGCCC 

«J X » — • > — X vJ V — I > — ■ V— v — • 


ACATGGAGGC 


CACGGGGGTA 


TGGTTGGATG 


1360 


20 


TGGCCTACTT 


GAAGGCCCTT 


TCCCTGGAGG 


TGGAGGCGGA 


1400 




GCTCAGGCGC 


CTCGAGGAGG 


AGGTCCACCG 


ACTGGCCGGG 


1440 




CATCCTTTCA 

X^»* X X» ^bp* XV A A Xj^X^X 


ACCTGAACTC 


CCGGGACCAG 


CTGGAAAGGG 


1480 




TCCTCTTTGA 


CGAGCTTGGG 


CTTCCCGCCA 


TCGGCAAGAC 


1520 




GGAGAAGACG 


GGTAAGCGTT 


i 

CCACCAGCGC 


CGCCGTTTTG 


1560 


25 


GAGGCTTTGA 

w**>m'^-' W X» X. A \«JXX 


GGG AGG C T C A 


TCCCATAGTG 

X> ■ Xm*XT X Xj * *V X- \J 


GACCGCATCC 


1600 




TCCAGTACCG 


GGAGCTTTCC 


AAGCTCAAGG 


GAACGTACAT 


1640 




CGATCCCTTG 


CCCGCCCTGG 


TCCACCCCAA 


GACGAACCGC 


1680 




CTCCACACCC 


GTTTCAACCA 

\^ X. X. X> X^X AX XX«* wiA 


GACGGCCACC 


GCCACGGGGA 


1720 




GGCTTAGCAG 


CTCGGATCCC 


AACCTGCAAA 


ATATCCCCGT 


1760 


30 


GCGCACCCCT 


TTAGGCCAGC 


GGATCCGCCG 


GGCCTTCGTG 


1800 




GCCGAGGAGG 


GGTGGAGGCT 


GGTGGTTTTG 


GACTACAGCC 


1840 




AGATTGAGCT 


CAGGGTCCTG 


GCGCACCTTT 


CCGGGGACGA 


1880 




GAACCTGATC 


CGGGTCTTCC 


AAGAGGGCCA 


GGACATCCAC 


1920 



19 



ACCCAGACGG 


CCAGCTGGAT 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


CCGTGGATTC 


CCTGATGCGC 


CGGGCGGCCA 


AGACCATCAA 


2000 


CTACGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2 04 0 


GGAGAGCTGG 


CCATCCCCTA 


CGAGGAAGCG 


GTGGCCTTCA 


\J O \J 


TCGAGCGGTA 


TTTCCAGAGC 


TTCCCCAAGG 


TAPGGGPPTG 


919 0 

Z. X Z< U 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


GGGGGGPT A T 


oi en 

Z> x O U 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAT 


GTGPPPGAPT 


9 9 no 
z. z, vj u 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGGGGAdPG 

wlwwwwxiwww 


994 O 
z, z; *± u 


CATGGCCTTC 


AACATGCCGG 


TPPAGGGGAP 


PGPPGPnr? at 


9 9 ft O 
z z o u 


TTGATGAAAC 


TGGPCATGGT 

X VJVJ^ V — -f^. X VJVJ X 


GAAGPTPTTT 


pppAnrcpTTr* 

^V-v^H.VJww XXV— 




AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAC 
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In another embodiment, the invention provides a nucleic acid of SEQ ID 
NO:4, a derivative nucleic acid related to Thermits scotoductus, strain X-l, 
having GAC (encoding Asp) in place of GGG (encoding Gly) at positions 136- 
138. SEQ ID NO:4 is provided below. 



ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 


TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 


TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 


GTCCAGGCGG 


TGTACGACTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 


CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 


CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 


TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GACTTTCCCC 


280 


GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 


CCTGGAGCGC 


CTCGAGGTGC 


CGGGCTTTGA 


GGCGGATGAC 


360 


GTCCTGGCTA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 


ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 


GCTTCTTTCG 


GAGCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 


TACCTGATCA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 


TTAAGCCTTC 


CCAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 


GGACCCTTCC 


GACAACATCC 


CCGGCGTGAA 


GGGCATCGGG 


600 



20 





GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


640 




TGGAAAACCT 


TCTTAAGCAC 


CTGGAACAGG 


TGAAACCTGC 


680 




CTCCGTGCGG 


GAGAAGATCC 


TTAGCCACAT 


GGAGGACCTC 


720 




AAGCTATCCC 


TGGAGCTATC 


CCGGGTGCGC 


ACGGACTTGC 


760 


5 


CCCTTCAGGT 


GGACTTCGCC 


CGGCGCCGGG 


AGCCGGACCG 


800 




GGAGGGGCTT 


AAGGCCTTTT 


TGGAGAGGCT 


GGAGTTCGGA 


840 




AGCCTCCTCC 


ACGAGTTCGG 


CCTGTTGGAA 


AGCCCGGTGG 


880 




CGGCGGAGGA 


AGCTCCCTGG 


CCGCCCCCCG 


AGGGAGCCTT 


920 




CGTGGGGTAC 


GTTCTTTCCC 


GCCCCGAGCC 


CATGTGGGCG 


960 


10 


GAGCTTAACG 


CCTTGGCCGC 


CGCCTGGGAG 


GGAAGGGTTT 


1000 




ACCGGGCGGA 


GGATCCCTTG 


GAGGCCTTGC 


GGGGGCTTGG 


1040 




GGAGGTGAGG 


GGGCTTTTGG 


CCAAGGACCT 


GGCGGTGCTG 


1080 




GCCCTGAGGG 


AAGGGATTGC 


CCTGGCACCG 


GGCGACGACC 


1120 




CCATGCTCCT 


CGCCTACCTC 


CTGGATCCTT 


CCAACACCGC 


1160 


15 


CCCCGAAGGG 


GTAGCCCGGC 


GCTACGGGGG 


GGAGTGGACC 


1200 




GAGGAGGCGG 


GGGAAAGGGC 


GTTGCTTTCC 


GAAAGGCTTT 


1240 




ACGCCGCCCT 


CCTGGAGCGG 


CTTAAGGGGG 


AGGAGAGGCT 


1280 




TCTTTGGCTT 


TACGAGGAGG 


TGGAAAAGCC 


CCTTTCGCGG 


1320 




GTCCTGGCCC 


ACATGGAGGC 


CACGGGGGTA 


CGGTTGGATG 


1360 


20 


TGGCCTACTT 


AAAGGCCCTT 


TCCCTGGAGG 


TGGAGGCGGA 


1400 




GCTCAGGCGC 


CTCGAGGAGG 


AGGTCCACCG 


CCTGGCCGGG 


1440 




CATCCTTTCA 


ACCTGAACTC 


CCGGGACCAG 


CTGGAAAGGG 


1480 




TCCTCTTTGA 


CGAGCTTGGG 


CTTCCCGCCA 


TCGGCAAGAC 


1520 




GGAGAAGACG 


GGCAAGCGCT 


CCACCAGCGC 


CGCCGTTTTG 


1560 


25 


GAGGCCTTGC 


GGGAGGCTCA 


TCCCATCGTG 


GACCGCATCC 


1600 




TTCAGTACCG 


GGAGCTTTCC 


AAGCTCAAGG 


GAACCTACAT 


1640 




CGATCCCTTG 


CCTGCCCTGG 


TCCACCCCAA 


GACGAACCGC 


1680 




CTCCACACCC 


GTTTCAACCA 


GACGGCCACC 


GCCACGGGGA 


1720 




GGCTTAGCAG 


CTCGGATCCC 


AACCTGCAAA 


ATATCCCCGT 


1760 


30 


GCGCACCCCT 


TTGGGCCAGC 


GGATCCGCCG 


GGCCTTCGTG 


1800 




GCCGAGGAGG 


GGTGGAGGCT 


GGTGGTTTTG 


GACTACAGCC 


1840 




AGATTGAGCT 


CAGGGTCCTG 


GCGCACCTTT 


CCGGGGACGA 


1880 




GAACCTAATC 


CGGGTCTTCC 


AGGAGGGCCA 


GGACATCCAC 


1920 




ACCCAGACGG 


CCAGCTGGAT 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


35 


CCGTGGATTC 


CCTGATGCGT 


CGGGCGGCCA 


AGACCATCAA 


2000 




CTTCGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2040 




GGAGAGCTGG 


CCATCCCCTA 


CGAGGAGGCG 


GTGGCCTTCA 


2080 



21 



i 
i 

i 



TCGAGCGGTA 


TTTCCAGAGG 


TACCCCAAGG 


TGCGGGCCTG 


2120 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


ACGGGGCTAT 


2160 


GTGGAAACCC 


TCTTTGGCCG 


CGGGCGCTAC 


GTGCCCGACT 


2200 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 


TTTl 21 TCI H 21 21 f" 1 


_L Vjv3V_ \_~nX J. 




1 J.L 




AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAAGTGGGCA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AG 
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In another embodiment, the invention provides a nucleic acid of SEQ ID 
NO:5, a derivative nucleic acid related to Thermus scotoductus, strain SM3, 
1 5 having GAC (encoding Asp) in place of GGG (encoding Gly) at positions 136- 
138. SEQ ID NO:5 is provided below. 



ATGAGGGCGA TGCTGCCCCT 

20 TGCTTCTGGT GGACGGCCAC 
TTTTGCCCTG AAGGGCCTCA 
GTCCAGGCGG TGTA CGAC TT 
CGCTAAGGGA AGACGGGGAT 
CGCCAAGGCC CCCTCCTTCC 

25 TACAAGGCGG GGCGGGCTCC 
GGCAGCTTGC CCTTATCAAG 
CCTGGAGCGC CTCGAAGTGC 
GTCCTGGCCA CCCTGGCCAA 
ACGAGGTGCG CATCCTCACC 

30 GCTTCTTTCG GACCGAATCT 
TACCTGATCA CCCCGGAGTG 
TTAAGCCTTC CCAGTGGGTG 
GGACCCTTCC GACAACATCC 
GAGAAGACGG CGGCCAAGCT 

35 TGGAAAACCT TCTTAAGCAC 



CTTTGAGCCC 


AAGGGCCGGG 


40 


CACCTGGCCT 


ACCGTACCTT 


80 


CCACCAGCCG 


CGGGGAGCCG 


120 


TGCCAAGAGC 


CTTTTGAAGG 


160 


GTGGTGATCG 


TGGTGTTTGA 


200 


GCCACCAGAC 


CTACGAGGCC 


240 


CACCCCCGAG 


GACTTTCCCC 


280 


GAGATGGTGG 


ACCTTTTGGG 


320 


CGGGTTTTGA 


GGCGGATGAC 


360 


GAAGGCGGAA 


AAGGAAGGCT 


400 


GCGGACCGGG 


ACCTTTACCA 


440 


CCATCCTTCA 


CCCGGAGGGT 


480 


GCTTTGGGAG 


AAGTATGGGC 


520 


GACTACCGGG 


CCTTGGCCGG 


560 


CCGGCGTGAA 


GGGCATCGGG 


600 


GATCCGGGAG 


TGGGGAAGCC 


640 


CTGGAACAGG 


TGAAACCTGC 


680 
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CTCCGTGCGG GAGAAGATCC 
AAGCTATCCC TGGAGCTTTC 
CCCTTCAGGT GGACTTCGCC 
GGAAGGGCTT AAGGCCTTTT 
5 AGCCTCCTCC ACGAGTTCGG 
CGGCGGAGGA AGCTCCCTGG 
CGTGGGGTAC GTTCTTTCCC 
GAGCTTAACG CCTTGGCCGC 
ACCGGGCGGA GGATCCCTTG 

10 GGAGGTGAGG GGGCTTTTGG 
GCCCTGAGGG AAGGGATTGC 
CCATGCTCCT CGCCTACCTC 
CCCCGAAGGG GTAGCCCGGC 
GAGGAGGCGG GGGAAAGGGC 

15 ACGCCGCCCT CCTGGAGCGG 
TCTTTGGCTT TACGAGGAGG 
GTCCTGGCCC ACATGGAGGC 
TGGCCTACTT GAAGGCCCTT 
GCTCAGGCGC CTCGAGGAGG 

20 CATCCTTTCA ACCTGAACTC 
TCCTCTTTGA CGAGCTTGGG 
GGAGAAGACG GGTAAGCGTT 
GAGGCTTTGA GGGAGGCTCA 
TCCAGTACCG GGAGCTTTCC 

25 CGATCCCTTG CCCGCCCTGG 
CTCCACACCC GTTTCAACCA 
GGCTTAGCAG CTCGGATCCC 
GCGCACCCCT TTAGGCCAGC 
GCCGAGGAGG GGTGGAGGCT 

30 AGATTGAGCT CAGGGTCCTG 
GAACCTGATC CGGGTCTTCC 
ACCCAGACGG CCAGCTGGAT 
CCGTGGATTC CCTGATGCGC 



TT PP n A P ft T 
X J. HbL K~ J. 


pnftnnftpPTP 


790 


p r*ncin. r mr i ft p 


ftppniftfyrTnp 


7£0 

/ *_> VJ 


VjC L.OV3VJ 


flpppnnZkrm 

Avj L V- O vj-rvV- Vj 


ouu 


rp/-i/-i * /-i TV P«P'P ,r P 


pp TV pTTmn A 


Of* U 


PPTPTTPP A A 
LL loll OvjAA 


AvjL L LLjo 1 VjV? 


0 O U 


LLvjLLL.L.L.L.Lj 


AVjLjVjAvjLL 1 X 


no rv 
U 


GL L. L LvjAGL L 


LA 1 G 1 GGVjLLj 




L.LjLL 1 GGGAG 


GGAAGGG1 1 J. 


1 a n a 
1UUU 


GAGGLC 1 1 GL 


GGGGGL 1 1 




CCAAGGACC 1 


GGLGG 1 GL 1 G 


1 n q a 


LL 1 GGCALAG 


GGLGAL bAL L 


llzl) 


U I GGA 1 LL 1 1 


LLAALALLGL 


llbl) 


GL 1 ALGGGGG 


GGAG 1 GGAL L 


1 O A Pi 

1ZVV 


GL1GC1 I ILL 


GAAAGGL 111 




CTTAAGGGGG 


AGGAGAGGC 1 


12 8 0 


TGGAAAAGCC 


CCTTTCGCGG 


1320 


CACGGGGGTA 


rpPprnm/^/*i TV rnpi 

TGGTTGGATG 


1360 


TCCCTGGAGG 


TGGAGGCGGA 


14 00 


AGGTCCACCG 


ACTGGCCGGG 


1440 


CCGGGACCAG 


CTGGAAAbGb 


1 A O A 

14 8 0 


CTTCCCGCCA 


TCGGCAAGAC 


lbzU 


CCACCAGCGC 


CGCCGTTTTG 


1560 


TLLCATAG J. G 


bAL LCjCA ILL 


1 A A 

16UU 


Tv 7v /"irn/TTV TV O/^l 

AAGCTCAAGG 


GAACGTACAT 


1 a a 
164 0 


TCCACCCCAA 


GACGAACCGC 


1 68 U 


GACGGCCACC 


GC CACGGGGA 


X 1 Z\J 


AACCTGCAAA 


ATATCCCCGT 


T *7 A 
1 /60 


GGA I LLGLLG 


CjbLLl 1LG1G 


1 Q A A 
1 OU 0 


GGTGGTTTTG 


GACTACAGCC 


1840 


GCGCACCTTT 


CCGGGGACGA 


1880 


AAGAGGGCCA 


GGACATCCAC 


1920 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


CGGGCGGCCA 


AGACCATCAA 


2000 



23 





CTTCGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2040 




GGAGAGCTGG 


CCATCCCCTA 


CGAGGAAGCG 


GTGGCCTTCA 


2080 




TCGAGCGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TACGGGCCTG 


2120 




GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


GCGGGGCTAT 


2160 


5 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAT 


GTGCCCGACT 


2200 




TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGGG 


2240 




CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 




TTGATGAAAC 


TGGCCATGGT 


GAAGCTCTTT 


CCCAGGCTTC 


2320 




AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


10 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 




GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 




TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 




CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAC 
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15 In another embodiment, the invention provides a nucleic acid of SEQ ID 

NO:6, a derivative nucleic acid related to Thermus scotoductus, strain Vi7a, 
having GAC (encoding Asp) in place of GGG (encoding Gly) at positions 136- 
138. SEQ ED N0:6 is provided below. 



20 


ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 




TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 




TTTTGCCCTG 


AAGGGGCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 




GTCCAGGCGG 


TGTAGGACTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 




CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 


25 


CGCCAAGGCC 


CCCTCCTTGC 


GCCACCAGAC 


CTACGAGGCC 


240 




TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GACTTTCCCC 


280 




GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 




CCTGGAGCGC 


CTCGAAGTGC 


CGGGTTTTGA 


GGCGGATGAC 


360 




GTCCTGGCCA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 


30 


ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 




GCTTCTTTCG 


GACCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 




TACCTGATTA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 




TTAAGCCTTC 


CGAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 
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GGACCCTTCC GACAACATCC 
GAGAAGACGG CGGCCAAGCT 
TGGAAAACCT TCTTAAGCAC 
CTCCGTGCGG GAGAAGATCC 
5 AAGCTATCCC TGGAGCTTTC 
CCCTTCAGGT GGACTTCGCC 
GGAAGGGCTT AAGGCCTTTT 
AGCCTCCTCC ACGAGTTCGG 
CGGCGGAGGA AGCTCCCTGG 

10 CGTGGGGTAC GTTCTTTCCC 
GAGCTTAACG CCTTGGCCGC 
ACCGGGCGGA GGATCCCTTG 
GGAGGTGAGG GGGCTTTTGG 
GCCCTGAGGG AAGGGATTGC 

15 CCATGCTCCT CGCCTACCTC 
CCCCGAAGGG GTAGCCCGGC 
GAGGAGGCGG GGGAAAGGGC 
ACGCCGCCCT CCTGGAGCGG 
TCTTTGGCTT TACGAGGAGG 

20 GTCCTGGCCC ACATGGAGGC 
TGGCCTACTT GAAGGCCCTT 
GCTCAGGCGC CTCGAGGAGG 
CATCCTTTCA ACCTGAACTC 
TCCTCTTTGA CGAGCTTGGG 

25 GGAGAAGACG GGTAAGCGTT 
GAGGCTTTGA GGGAGGCTCA 
TCCAGTACCG GGAGCTTTCC 
CGATCCCTTG GCCGCCCTGG 
CTCCACACCC GTTTCAACCA 

30 GGCTTAGCAG CTCGGATCCC 
GCGCACCCCT TTAGGCCAGC 
GCCGAGGAGG GGTGGAGGCT 
AGATTGAGCT CAGGGTCCTG 



CCGGCGTGAA 


GGGCATCGGG 


600 


GATCCGGGAG 


TGGGGAAGCC 


640 


CTGGAACAGG 


TGAAACCTGC 


680 


TTAGCCACAT 


GGAGGACCTC 


720 


CCGGGTGCAC 


ACGGAGTTGC 


760 


CGGCGCCGGG 


AGCCGGACCG 


800 


TGGAGAGGCT 


GGAGTTCGGA 


840 


CCTGTTGGAA 


AGCCCGGTGG 

*InJ ' » x^ XJ X-* -J- WW 


880 


CCGCCCCCCG 


AGGGAGCCTT 


920 


GCCCCGAGCC 


CATGTGGGCG 


960 


CGCCTGGGAG 


GGAAGGGTTT 

W VJiwlV W X,*l XXX 


1000 


GAGGCCTTGC 


GGGGGCTTGG 

V— ' * — ' vJVJVJ V — ■ X x VJvJ 


1040 


C PAAGGAPPT 


GGCGGTGPTG 


1080 

JL W V \J 


PPTGGPAPPG 


GGPGAPGAPP 


1120 


PTGGATPPTT 

x vjun x v— x x 


PPAAPACPGP 


1160 

X J. U u 




GGAGTGGAPP 


1200 


GPTGPTTTPP 


GAAAGGPTTT 

VJrinnvJVJw x x J- 


1240 


PTTAAGGGGG 


AGGAGAGGPT 


1280 

X6 U v 


TGGAAAAGPP 


PPTTTPGPGG 

x_ V — . XXX wVJwVJw 


1320 


WAV* wwVJVJVJ X 


TGGTTGGATG 

X VJVJ X X VJVJ/1 X w 


J ) \J \J 


TPPPTGGAGG 


TGGAGGPGGA 


1400 


AGGTPPAPPG 


APTGGPPGGG 


1440 

J. n 7 w 


PPGGGAPPAG 


PTGGAAAGGG 


1480 


CTTCCCGCCA 


TCGGCAAGAC 


1520 


CCACCAGCGC 


CGCCGTTTTG 

Xw x_J Xv Xv X.J .X X X J- 


1560 


TCCCATAGTG 

x >— > — x ravj x vj 


GACCGCATCC 

W*XW wvV*** X x^ x-. 


1600 


AAGPTPAAGG 


GAAPGTAPAT 


1640 

X UTS V 


TPPAPPPPAA 


GAPGAAPPGP 


1680 


GACGGCCACC 


GCCACGGGGA 


1720 


AACCTGCAAA 


ATATCCCCGT 


1760 


GGATCCGCCG 


GGCCTTCGTG 


1800 


GGTGGTTTTG 


GACTACAGCC 


1840 


GCGCACCTTT 


CCGGGGACGA 


1880 



25 



s 





GAACCTGATC 


CGGGTCTTCC 


AAGAGGGCCA 


GGACATCCAC 


1920 




ACCCAGACGG 


CCAGCTGGAT 


GTTCGGCGTG 


CCCCCAGAGG 


1960 




CCGTGGATTC 


CCTGATGCGC 


CGGGCGGCCA 


AGACCATCAA 


2000 




CTACGGCGTC 


CTCTACGGCA 

V 

CCATCCCCTA 

v» v_xx x v«» ^» v-» x xx 


TGTCCGCCCA 


CCGGCTTTCG 


2040 


5 


GGAGAGCTGG 


CGAGGAAGCG 


GTGGCCTTCA 


2080 




TCGAGCGGTA 


TTTCCAGAGC 


TTCCCCAAGG 


TACGGGCCTG 


2120 




GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


GCGGGGCTAT 


2160 




GTGGAAACCC 


TGTTTGGGCG 


CCGGCGCTAT 


GTGCCCGACT 


2200 




TGGCTTCC'CIG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


10 


CATGGCCTTC 


AA CATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 




TTGATGAAAC 


TGGCCATGGT 

X \J\JV V— xi. X \J\J X 


GAAGCTCTTT 


CCCAGGCTTC 


2320 




AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 




ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 




GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


15 


TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 




CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAC 
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In another embodiment, the invention provides a nucleic acid of SEQ ID 
NO: 7, a derivative nucleic acid related to Thermus scotoductus, strain X-l, 
having TAC (encoding Tyr) in place of TTC (encoding Phe) at positions 2002- 
20 04. SEQ ID NO:7 is provided below: 





ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 




TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 




TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 


25 


GTCCAGGCGG 


TGTACGGGTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 




CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 




CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 




TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GAGTTTCCCC 


280 




GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 


30 


CCTGGAGCGC 


CTCGAGGTGC 


CGGGCTTTGA 


GGCGGATGAC 


360 




GTCCTGGCTA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 




ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 




GCTTCTTTCG 


GAGCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 




TACCTGATCA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 


35 


TTAAGCCTTC 


CCAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 



26 





GGACCCTTCC 


GACAACATCC 


CCGGCGTGAA 


GGGCATCGGG 


600 




GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


640 




TGGAAAACCT 


TCTTAAGCAC 


CTGGAACAGG 


TGAAACCTGC 


680 




CTCCGTGCGG 


GAGAAGATCC 


TTAGCCACAT 


GGAGGACCTC 


720 


5 


AAGCTATCCC 


TGGAGCTATC 


CCGGGTGCGC 


ACGGACTTGC 


760 




CCCTTCAGGT 


GGACTTCGCC 


CGGCGCCGGG 


AGCCGGACCG 


800 




GGAGGGGCTT 


AAGGCCTTTT 


TGGAGAGGCT 


GGAGTTCGGA 


840 




AGCCTCCTCG 


ACGAGTTCGG 


CCTGTTGGAA 


AGCCCGGTGG 


880 




CGGCGGAGGA 


AGCTCCCTGG 


CCGCCCCCCG 


AGGGAGCCTT 


920 


10 


CGTGGGGTAC 


GTTCTTTCCC 


GCCCCGAGCC 


CATGTGGGCG 


960 




GAGCTTAACG 


CCTTGGCCGC 


CGCCTGGGAG 


GGAAGGGTTT 


1000 




ACCGGGCGGA 


GGATCCCTTG 


GAGGCCTTGC 


GGGGGCTTGG 


1040 




GGAGGTGAGG 


GGGCTTTTGG 


CCAAGGACCT 


GGCGGTGCTG 


1080 




GCCCTGAGGG 


AAGGGATTGC 


CCTGGCACCG 


GGCGACGACC 


1120 


15 


CCATGCTCCT 


CGCCTACCTC 


CTGGATCCTT 


CCAACACCGC 


1160 




CCCCGAAGGG 


GTAGCCCGGC 


GCTACGGGGG 


GGAGTGGACC 


1200 




GAGGAGGCGG 


GGGAAAGGGC 


GTTGCTTTCC 


GAAAGGCTTT 


1'240 




ACGCCGCCCT 


CCTGGAGCGG 


CTTAAGGGGG 


AGGAGAGGCT 


1280 




TCTTTGGCTT 


TACGAGGAGG 


TGGAAAAGCC 


CCTTTCGCGG 


1320 


20 


GTCCTGGCCC 


ACATGGAGGC 


CACGGGGGTA 


CGGTTGGATG 


1360 




TGGCCTACTT 


AAAGGCCCTT 


TCCCTGGAGG 


TGGAGGCGGA 


1400 




GCTCAGGCGC 


CTCGAGGAGG 


AGGTCCACCG 


CCTGGCCGGG 


1440 




CATCCTTTCA 


ACCTGAACTC 


CCGGGACCAG 


CTGGAAAGGG 


1480 




TCCTCTTTGA 


CGAGCTTGGG 


CTTCCCGCCA 


TCGGCAAGAC 


1520 


25 


GGAGAAGACG 


GGCAAGCGCT 


CCACCAGCGC 


CGCCGTTTTG 


1560 




GAGGCCTTGC 


GGGAGGCTCA 


TCCCATCGTG 


GACCGCATCC 


1600 




TTCAGTACCG 


GGAGCTTTCC 


AAGCTCAAGG 


GAACCTACAT 


1640 




CGATCCCTTG 


CCTGCCCTGG 


TCCACCCCAA 


GACGAACCGC 


1680 




CTCCACACCC 


GTTTCAACCA 


GACGGCCACC 


GCCACGGGGA 


1720 


30 


GGCTTAGCAG 


CTCGGATCCC 


AACCTGCAAA 


ATATCCCCGT 


1760 




GCGCACCCCT 


TTGGGCCAGC 


GGATCCGCCG 


GGCCTTCGTG 


1800 




GCCGAGGAGG 


GGTGGAGGCT 


GGTGGTTTTG 


GACTACAGCC 


1840 




AGATTGAGCT 


CAGGGTCCTG 


GCGCACCTTT 


CCGGGGACGA 


1880 




GAACCTAATC 


CGGGTCTTCC 


AGGAGGGCCA 


GGACATCCAC 


1920 


35 


ACCCAGACGG 


CCAGCTGGAT 


GTTCGGCGTG 


CCCCCAGAGG 


1960 




CCGTGGATTC 


CCTGATGCGT 


CGGGCGGCCA 


AGACCATCAA 


2000 




CTACGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2040 



27 



i 



GGAGAGCTGG 


CCATCCCCTA 


CGAGGAGGCG 


GTGGCCTTCA 


2080 


TCGAGCGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TGCGGGCCTG 


2120 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


ACGGGGCTAT 


2160 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAC 


GTGCCCGACT 


2200 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 


TTGATGAAAC 


TGGCCATGGT 


GAAGCTCTTT 


CCCAGGCTTC 


2320 


AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAAGTGGGCA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AG 
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In another embodiment, the invention provides a nucleic acid of SEQ ED 
1 5 NO: 8, a derivative nucleic acid related to Thermus scotoductus, strain SM3, 
having TAC (encoding Tyr) in place of TTC (encoding Phe) at positions 2002- 
04. SEQ ID NO:8 is provided below: 



20 


ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 




TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 




TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 




GTCCAGGCGG 


TGTACGGGTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 




CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 


25 


CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 




TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GACTTTCCCC 


280 




GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 




CCTGGAGCGC 


CTCGAAGTGC 


CGGGTTTTGA 


GGCGGATGAC 


360 




GTCCTGGCCA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 


30 


ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 




GCTTCTTTCG 


GACCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 




TACCTGATCA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 




TTAAGCCTTC 


CCAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 




GGACCCTTCC 


GACAACATCC 


CCGGCGTGAA 


GGGCATCGGG 


600 


35 


GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


640 



28 



TGGAAAACCT TCTTAAGCAC 
CTCCGTGCGG GAGAAGATCC 
AAGCTATCCC TGGAGCTTTC 
CCCTTCAGGT GGACTTCGCC 
5 GGAAGGGCTT AAGGCCTTTT 
AGCCTCCTCC ACGAGTTCGG 
CGGCGGAGGA AGCTCCCTGG 
CGTGGGGTAC GTTCTTTCCC 
GAGCTTAACG CCTTGGCCGC 

10 ACCGGGCGGA GGATCCCTTG 
GGAGGTGAGG GGGCTTTTGG 
GCCCTGAGGG AAGGGATTGC 
CCATGCTCCT CGCCTACCTC 
CCCCGAAGGG GTAGCCCGGC 

15 GAGGAGGCGG GGGAAAGGGC 
ACGCCGCCCT CCTGGAGCGG 
TCTTTGGCTT TACGAGGAGG 
GTCCTGGCCC ACATGGAGGC 
TGGCCTACTT GAAGGCCCTT 

20 GCTCAGGCGC CTCGAGGAGG 
CATCCTTTCA ACCTGAACTC 
TCCTCTTTGA CGAGCTTGGG 
GGAGAAGACG GGTAAGCGTT 
GAGGCTTTGA GGGAGGCTCA 

25 TCCAGTACCG GGAGCTTTCC 
CGATCCCTTG CCCGCCCTGG 
CTCCACACCC GTTTCAACCA 
GGCTTAGCAG CTCGGATCCC 
GCGCACCCCT TTAGGCCAGC 

30 GCCGAGGAGG GGTGGAGGCT 
AGATTGAGCT CAGGGTCCTG 
GAACCTGATC CGGGTCTTCC 
ACCCAGACGG CCAGCTGGAT 



CTGGAACAGG 


TGAAACCTGC 


680 


TTAGCCACAT 


GGAGGACCTC 


720 


CCGGGTGCAC 


ACGGAGTTGC 


760 


CGGCGCCGGG 


AGCCGGACCG 


800 


TGGAGAGGCT 


GGAGTTCGGA 


840 


CCTGTTGGAA 


AGCCCGGTGG 


880 


CCGCCCCCCG 


AGGGAGCCTT 


920 


GCCCCGAGCC 


CATGTGGGCG 


960 


CGCCTGGGAG 


GGAAGGGTTT 


1000 


GAGGCCTTGC 


GGGGGCTTGG 


1040 


CCAAGGACCT 


GGCGGTGCTG 


1080 


CCTGGCACAG 


GGCGACGACC 


1120 


CTGGATCCTT 


CCAACACCGC 


1160 


GCTACGGGGG 


GGAGTGGACC 


1200 


GCTGCTTTGC 


GAAAGGCTTT 


1240 


CTTAAGGGGG 


AGGAGAGGCT 


1280 


TGGAAAAGCC 


CCTTTCGCGG 


1320 


CACGGGGGTA 


TGGTTGGATG 


1360 


TCCCTGGAGG 


TGGAGGCGGA 


1400 


AGGTCCACCG 


ACTGGCCGGG 


1440 


CCGGGACCAG 


CTGGAAAGGG 


1480 


CTTCCCGCCA 


TCGGCAAGAC 


1520 


CCACCAGCGC 


CGCCGTTTTG 


1560 


TCCCATAGTG 


GACCGCATCC 


1600 


AAGCTCAAGG 


GAACGTACAT 


1640 


TCCACCCCAA 


GACGAACCGC 


1680 


GACGGCCACC 


GCCACGGGGA 


1720 


AACCTGCAAA 


ATATCCCCGT 


1760 


GGATCCGCCG 


GGCCTTCGTG 


1800 


GGTGGTTTTG 


GACTACAGCC 


1840 


GCGCACCTTT 


CCGGGGACGA 


1880 


AAGAGGGCCA 


GGACATCCAC 


1920 


GTTCGGCGTG 


CCCCCAGAGG 


1960 



29 



CCGTGGATTC 


CCTGATGCGC 


CGGGCGGCCA 


AGACCATCAA 


2000 


CTACGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2040 


GGAGAGCTGG 


CCATCCCCTA 


CGAGGAAGCG 


GTGGCCTTCA 


2080 


TCGAGCGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TACGGGCCTG 


2120 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


GCGGGGCTAT 


2160 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAT 


GTGCCCGACT 


2200 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 


TTGATGAAAC 


TGGCCATGGT 


GAAGCTCTTT 


CCCAGGCTTC 


2320 






LI 1X1 LjCAvjG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAG 
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In another embodiment, the invention provides a nucleic acid of SEQ ID 
NO:9, a derivative nucleic acid related to Thermus scotoductus, strain Vi7a, 
having TAC (encoding Tyr) in place of TTC (encoding Phe) at positions 2101- 
03. SEQ ID NO:9 is provided below: 



ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 


TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 


TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 


GTCCAGGCGG 


TGTACGGGTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 


CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 


CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 


TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GACTTTCCCC 


280 


GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 


CCTGGAGCGC 


CTCGAAGTGC 


CGGGTTTTGA 


GGCGGATGAC 


360 


GTCCTGGCCA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 


ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 


GCTTCTTTCG 


GACCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 



30 





TACCTGATTA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 




TTAAGCCTTC 


CCAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 




GGACCCTTCC 


GACAACATCC 


CCGGCGTGAA 


GGGCATCGGG 


600 




GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


640 


5 


TGGAAAACCT 


TCTTAAGCAC 


CTGGAACAGG 


TGAAACCTGC 


680 




CTCCGTGCGG 


GAGAAGATCC 


TTAGCCACAT 


GGAGGACCTC 


720 




AAGCTATCCC 


TGGAGCTTTC 


CCGGGTGCAC 


ACGGAGTTGC 


760 




CCCTTCAGGT 


GGACTTCGCC 


CGGCGCCGGG 


AGCCGGACCG 


800 




GGAAGGGCTT 


AAGGCCTTTT 


TGGAGAGGCT 


GGAGTTCGGA 


840 


10 


AGCCTCCTCC 


ACGAGTTCGG 


CCTGTTGGAA 


AGCCCGGTGG 


880 




CGGCGGAGGA 


AGCTCCCTGG 


CCGGCCCCCG 


AGGGAGCCTT 


920 




CGTGGGGTAC 


GTTCTTTCCC 


GCCCCGAGCC 


CATGTGGGCG 


960 




GAGCTTAACG 


CCTTGGCCGC 


CGGCTGGGAG 


GGAAGGGTTT 


1000 




ACCGGGCGGA 


GGATCCCTTG 


GAGGGCTTGC 


GGGGGCTTGG 


1040 


15 


GGAGGTGAGG 


GGGCTTTTGG 


CCAAGGACCT 


GGCGGTGCTG 


1080 




GCCCTGAGGG 


AAGGGATTGC 


CCTGGCACCG 


GGCGACGACC 


1120 




CCATGCTCCT 


CGCCTACCTC 


CTGGATCCTT 


CCAACACCGC 


1160 




CCCCGAAGGG 


GTAGCCCGGC 


GCTACGGGGG 


GGAGTGGACC 


1200 




GAGGAGGCGG 


GGGAAAGGGC 


GCTGCTTTCC 


GAAAGGCTTT 


1240 


20 


ACGCCGCCCT 


CCTGGAGCGG 


CTTAAGGGGG 


AGGAGAGGCT 


1280 




TCTTTGGCTT 


TACGAGGAGG 


TGGAAAAGCC 


CCTTTCGCGG 


1320 




GTCCTGGCCC 


ACATGGAGGC 


CACGGGGGTA 


TGGTTGGATG 


1360 




TGGCCTACTT 


GAAGGCCCTT 


TCCCTGGAGG 


TGGAGGCGGA 


1400 




GCTCAGGCGC 


CTCGAGGAGG 


AGGTCCACCG 


ACTGGCCGGG 


1440 


25 


CATCCTTTCA 


ACCTGAACTC 


CCGGGACCAG 


CTGGAAAGGG 


1480 




TCCTCTTTGA 


CGAGCTTGGG 


CTTCCCGCCA 


TCGGCAAGAC 


1520 




GGAGAAGACG 


GGTAAGCGTT 


CCACCAGCGC 


CGCCGTTTTG 


1560 




GAGGCTTTGA 


GGGAGGCTCA 


TCCCATAGTG 


GACCGCATCC 


1600 




TCCAGTACCG 


GGAGCTTTCC 


AAGCTCAAGG 


GAACGTACAT 


1640 


30 


CGATCCCTTG 


CCCGCCCTGG 


TCCACCCCAA 


GACGAACCGC 


1680 




CTCCACACCC 


GTTTCAACCA 


GACGGCCACC 


GCCACGGGGA 


1720 




GGCTTAGCAG 


CTCGGATCCC 


AACCTGCAAA 


ATATCCCCGT 


1760 




GCGCACCCCT 


TTAGGCCAGC 


GGATCCGCCG 


GGCCTTCGTG 


1800 



31 



GCCGAGGAGG 


GGTGGAGGCT 


GGTGGTTTTG 


GACTACAGCC 


1840 


AGATTGAGCT 


CAGGGTCCTG 


GCGGACCTTT 


CCGGGGACGA 


1880 


GAACCTGATC 


CGGGTCTTCC 


AAGAGGGCCA 


GGACATCCAC 


1920 


ACCCAGACGG 


CCAGCTGGAT 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


CCGTGGATTC 


CCTGATGCGC 


CGGGCGGCCA 


AGACCATCAA 


2000 


CTACGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2040 


GGAGAGCTGG 


CCATCCCCTA 


CGAGGAAGCG 


GTGGGCTTCA 


2080 


TCGAGCGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TACGGGCCTG 


2120 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


GCGGGGCTAT 


2160 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAT 


GTGCCCGACT 


2200 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 


TTGATGAAAC 


TGGCCATGGT 


GAAGCTCTTT 

vjnnvjStf x v — . x x x 


CCCAGGCTTC 


2320 


AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAC 
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20 In another embodiment, the invention provides a nucleic acid of SEQ ED 

NO: 10, a derivative nucleic acid related to Thermus scotoductus, strain X-l, 
having GAC (encoding Asp) in place of GGG (encoding Gly) at positions 136- 
138 and having TAC (encoding Tyr) in place of TTC (encoding Phe) at positions 
2002-04. SEQ ID NO: 10 is provided below: 

25 





ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 




TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 




TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 


30 


GTCCAGGCGG 


TGTACGACTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 




CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 




CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 




TACAAGGCGG 


GGCGGGCTCG 


CACCCCCGAG 


GACTTTCCCC 


280 




GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 


35 


CCTGGAGCGC 


CTCGAGGTGC 


CGGGCTTTGA 


GGCGGATGAC 


360 



32 





GTCCTGGCTA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 




ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 




GCTTCTTTCG 


GAGCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 




TACCTGATCA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 


5 


TTAAGCCTTC 


CCAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 




GGACCCTTCC 


GACAACATCC 


CCGGCGTGAA 


GGGCATCGGG 


600 




GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


640 




TGGAAAACCT 


TCTTAAGCAC 


CTGGAACAGG 


TGAAACCTGC 


680 




CTCCGTGCGG 


GAGAAGATCC 


TTAGCCACAT 


GGAGGACCTC 


720 


10 


AAGCTATCCC 


TGGAGCTATC 


CCGGGTGCGC 


ACGGACTTGC 


760 




CCCTTCAGGT 


GGACTTCGCC 


CGGCGCCGGG 


AGCCGGACCG 


800 




GGAGGGGCTT 


AAGGCCTTTT 


TGGAGAGGCT 


GGAGTTCGGA 


840 




AGCCTCCTCC 


ACGAGTTCGG 


CCTGTTGGAA 


AGCCCGGTGG 


880 




CGGCGGAGGA 


AGCTCCCTGG 


CCGCCCCCCG 


AGGGAGCCTT 


920 


15 


CGTGGGGTAC 


GTTCTTTCCC 


GCCCCGAGCC 


CATGTGGGCG 


960 




GAGCTTAACG 


CCTTGGCCGC 


CGCCTGGGAG 


GGAAGGGTTT 


1000 




ACCGGGCGGA 


GGATCCCTTG 


GAGGCCTTGC 


GGGGGCTTGG 


1040 




GGAGGTGAGG 


GGGCTTTTGG 


CCAAGGACCT 


GGCGGTGCTG 


1080 




GCCCTGAGGG 


AAGGGATTGC 


CCTGGCACCG 


GGCGACGACC 


1120 


20 


CCATGCTCCT 


CGCCTACCTC 


CTGGATCCTT 


CCAACACCGC 


1160 




CCCCGAAGGG 


GTAGCCCGGC 


GCTACGGGGG 


GGAGTGGACC 


1200 




GAGGAGGCGG 


GGGAAAGGGC 


GTTGCTTTCC 


GAAAGGCTTT 


1240 




ACGCCGCCCT 


CCTGGAGCGG 


CTTAAGGGGG 


AGGAGAGGCT 


1280 




TCTTTGGCTT 


TACGAGGAGG 


TGGAAAAGCC 


CCTTTCGCGG 


1320 


25 


GTCCTGGCCC 


ACATGGAGGC 


CACGGGGGTA 


CGGTTGGATG 


1360 




TGGCGTACTT 


AAAGGCCCTT 


TCCCTGGAGG 


TGGAGGCGGA 


1400 




GCTCAGGCGC 


CTCGAGGAGG 


AGGTCCACCG 


CCTGGCCGGG 


1440 




CATCCTTTCA 


ACCTGAACTC 


CCGGGACCAG 


CTGGAAAGGG 


1480 




TCCTCTTTGA 


CGAGCTTGGG 


CTTCCCGCCA 


TCGGCAAGAC 


1520 


30 


GGAGAAGACG 


GGGAAGCGCT 


CCACCAGCGC 


CGCCGTTTTG 


1560 




GAGGCCTTGC 


GGGAGGCTCA 


TCCCATCGTG 


GACCGCATCC 


1600 




TTCAGTACCG 


GGAGCTTTCC 


AAGCTCAAGG 


GAACCTACAT 


1640 




CGATCCCTTG 


CCTGCCCTGG 


TCCACCCCAA 


GACGAACCGC 


1680 




CTCCACACCC 


GTTTCAACCA 


GACGGCCACC 


GCCACGGGGA 


1720 


35 


GGCTTAGCAG 


CTCGGATCCC 


AACCTGCAAA 


ATATCCCCGT 


1760 




GCGCACCCCT 


TTGGGCCAGC 


GGATCCGCCG 


GGCCTTCGTG 


1800 




GCCGAGGAGG 


GGTGGAGGCT 


GGTGGTTTTG 


GACTACAGCC 


1840 



33 



10 



15 



30 



AGATTGAGCT 


CAGGGTCCTG 


GCGCACCTTT 


CCGGGGACGA 


1880 


GAACCTAATC 


CGGGTCTTCC 


AGGAGGGCCA 


GGACATCCAC 


1920 


ACCCAGACGG 


CCAGCTGGAT 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


CCGTGGATTC 


CCTGATGCGT 


CGGGCGGCCA 


AGACCATCAA 


2000 


CTACGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


CCGGCTTTCG 


2040 


GGAGAGCTGG 


CCATCCCCTA 


CGAGGAGGCG 


GTGGCCTTCA 


2080 


TCGAGCGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TGCGGGCCTG 


2120 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


ACGGGGCTAT 


2160 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAC 


GTGCCCGACT 


2200 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 


TTGATGAAAC 


TGGCCATGGT 


GAAGCTCTTT 


CCCAGGCTTC 


2320 


AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAAGTGGGCA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AG 




2502 



In another embodiment, the invention provides a nucleic acid of SEQ ID 
20 NO: 1 1 , a derivative nucleic acid related to Thermus scotoductus, strain SM3, 
having GAC (encoding Asp) in place of GGG (encoding Gly) at positions 136- 
138 and having TAC (encoding Tyr) in place of TTC (encoding Phe) at positions 
2002-04. SEQ ID NO:8 is provided below: 

25 



35 



ATGAGGGCGA 


TGCTGCCCCT 


CTTTGAGCCC 


AAGGGCCGGG 


40 


TGCTTCTGGT 


GGACGGCCAC 


CACCTGGCCT 


ACCGTACCTT 


80 


TTTTGCCCTG 


AAGGGCCTCA 


CCACCAGCCG 


CGGGGAGCCG 


120 


GTCCAGGCGG 


TGTACGACTT 


TGCCAAGAGC 


CTTTTGAAGG 


160 


CGCTAAGGGA 


AGACGGGGAT 


GTGGTGATCG 


TGGTGTTTGA 


200 


CGCCAAGGCC 


CCCTCCTTCC 


GCCACCAGAC 


CTACGAGGCC 


240 


TACAAGGCGG 


GGCGGGCTCC 


CACCCCCGAG 


GACTTTCCCC 


280 


GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 


CCTGGAGCGC 


CTCGAAGTGC 


CGGGTTTTGA 


GGCGGATGAC 


360 


GTCCTGGCCA 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 


ACGAGGTGCG 


CATCCTCACC 


GCGGACCGGG 


ACCTTTACCA 


440 



34 





GCTTCTTTCG 


GACCGAATCT 


CCATCCTTCA 


CCCGGAGGGT 


480 




TACCTGATCA 


CCCCGGAGTG 


GCTTTGGGAG 


AAGTATGGGC 


520 




TTAAGCCTTC 


CCAGTGGGTG 


GACTACCGGG 


CCTTGGCCGG 


560 




GGACCCTTCC 


GACAACATCC 


CCGGCGTGAA 


GGGCATCGGG 


600 


5 


GAGAAGACGG 


CGGCCAAGCT 


GATCCGGGAG 


TGGGGAAGCC 


640 




TGGAAAACCT 


TCTTAAGCAC 


CTGGAACAGG 


TGAAACCTGC 


680 




CTCCGTGCGG 


GAGAAGATCC 


TTAGCCACAT 


GGAGGACCTC 


720 




AAGCTATCCC 


TGGAGCTTTC 


CCGGGTGCAC 


ACGGAGTTGC 


760 




CCCTTCAGGT 


GGACTTCGCC 


CGGCGCCGGG 


AGCCGGACCG 


800 


10 


GGAAGGGCTT 


AAGGCCTTTT 


TGGAGAGGCT 


GGAGTTCGGA 


840 




AGCCTCCTCC 


ACGAGTTCGG 


CCTGTTGGAA 


AGCCCGGTGG 


880 




CGGCGGAGGA 


AGCTCCCTGG 


CCGCCCCCCG 


AGGGAGCCTT 


920 




CGTGGGGTAC 


GTTCTTTCCC 


GCCCCGAGCC 


CATGTGGGCG 


960 




GAGCTTAACG 


CCTTGGCCGC 


CGCCTGGGAG 


GGAAGGGTTT 


1000 


15 


ACCGGGCGGA 


GGATCCCTTG 


GAGGCCTTGC 


GGGGGCTTGG 


1040 




GGAGGTGAGG 


GGGCTTTTGG 


CCAAGGACCT 


GGCGGTGCTG 


1080 




GCCCTGAGGG 


AAGGGATTGC 


CCTGGCACAG 


GGCGACGACC 


1120 




CCATGCTCCT 


CGCCTACCTC 


CTGGATCCTT 


CCAACACCGC 


1160 




CCCCGAAGGG 


GTAGCCCGGC 


GCTACGGGGG 


GGAGTGGACC 


1200 


20 


GAGGAGGCGG 


GGGAAAGGGC 


GCTGCTTTCC 


GAAAGGCTTT 


1240 




ACGCCGCCCT 


CCTGGAGCGG 


CTTAAGGGGG 


AGGAGAGGCT 


1280 




TCTTTGGCTT 


TACGAGGAGG 


TGGAAAAGCC 


CCTTTCGGGG 


1320 




GTCCTGGCCC 


ACATGGAGGC 


CACGGGGGTA 


TGGTTGGATG 


1360 




TGGCCTACTT 


GAAGGCCCTT 


TCCCTGGAGG 


TGGAGGCGGA 


1400 


25 


GCTCAGGCGC 


CTCGAGGAGG 


AGGTCCACCG 


ACTGGCCGGG 


1440 




CATCCTTTCA 


ACCTGAACTC 


CCGGGACCAG 


CTGGAAAGGG 


1480 




TCCTCTTTGA 


CGAGCTTGGG 


CTTCCCGCCA 


TCGGCAAGAC 


1520 




GGAGAAGACG 


GGTAAGCGTT 


CGACCAGCGC 


CGCCGTTTTG 


1560 




GAGGCTTTGA 


GGGAGGCTCA 


TCCCATAGTG 


GACCGCATCC 


1 C A A 

1600 


30 


TCCAGTACCG 


GGAGCTTTCC 


AAGCTCAAGG 


GAACGTACAT 


1640 




CGATCCCTTG 


CCCGCCCTGG 


TCCACCCCAA 


GACGAACCGC 


1680 




CTCCACACCC 


GTTTCAACCA 


GACGGCCACC 


GCCACGGGGA 


1720 




GGCTTAGCAG 


CTCGGATCCC 


AACCTGCAAA 


ATATCCCCGT 


1760 



35 



GCGCACCCCT 


TTAGGCCAGC 


GGATCCGCCG 


GGCCTTCGTG 


1800 


GCCGAGGAGG 


GGTGGAGGCT 


GGTGGTTTTG 


GACTACAGCC 


1840 


AGATTGAGCT 


CAGGGTCCTG 


GCGCACCTTT 


CCGGGGACGA 


1880 


GAACCTGATC 


CGGGTCTTCC 


AAGAGGGCCA 


GGACATCCAC 


1920 


ACCCAGACGG 


CCAGCTGGAT 


GTTCGGCGTG 


CCCCCAGAGG 


1960 


CCGTGGATTC 


CCTGATGCGC 


CGGGCGGCCA 


AGACCATCAA 


2000 


CTACGGCGTC 


CTCTACGGCA 


TGTCCGCCCA 


GCGGCTTTCG 


2040 


GGAGAGCTGG 


CCATCCCCTA 


CGAGGAAGCG 


GTGGCCTTCA 


2080 


TCGAGCGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TACGGGCCTG 


2120 


GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 


GCGGGGCTAT 


2160 


GTGGAAACCC 


TCTTTGGCCG 


CCGGCGCTAT 


GTGCCCGACT 


2200 


TGGCTTCCCG 


GGTGAAGAGC 


ATCCGGGAGG 


CAGCGGAGCG 


2240 


CATGGCCTTC 


AACATGCCGG 


TCCAGGGGAC 


CGCCGCGGAT 


2280 


TTGATGAAAC 


TGGCCATGGT 


GAAGCTCTTT 


CCCAGGCTTC 


2320 


AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 


GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGGCCC 


2440 


TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 


CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAC 
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20 

In another embodiment, the invention provides a nucleic acid of SEQ ED 
NO: 12, a derivative nucleic acid related to thermus scotoductus, strain Vi7a, 
having GAC (encoding Asp) in place of GGG (encoding Gly) at positions 136- 
138 and having TAC (encoding Tyr) in place of TTC (encoding Phe) at positions 
25 2101-03. SEQ ID NO:12 is provided below: 



ATGAGGGCGA TGCTGCCCCT CTTTGAGCCC AAGGGCCGGG 40 

TGCTTCTGGT GGACGGCCAC CACCTGGCCT ACCGTACCTT 80 

TTTTGCCCTG AAGGGCCTCA CCACCAGCCG CGGGGAGCCG 120 

30 GTCCAGGCGG TGTA CGAC TT TGCCAAGAGC CTTTTGAAGG 160 

CGCTAAGGGA AGACGGGGAT GTGGTGATCG TGGTGTTTGA 200 

CGCCAAGGCC CCCTCCTTCC GCCACCAGAC CTACGAGGCC 240 

TACAAGGCGG GGCGGGCTCC CACCCCCGAG GACTTTCCCC 280 



36 





GGCAGCTTGC 


CCTTATCAAG 


GAGATGGTGG 


ACCTTTTGGG 


320 




CCTGGAGCGC 


CTCGAAGTGC 


CGGGTTTTGA 


GGCGGATGAC 


360 




GTPPTGGPPA 

VJ IV^Ul VJ V_J V^ W» 


CCCTGGCCAA 


GAAGGCGGAA 


AAGGAAGGCT 


400 




A PGAGGTGGG 


CATCCTPAPP 


GPGGACCGGG 


ACCTTTACCA 


440 

~ ~ V/ 


5 


OPTTPTTTPG 

VJ V — X -L V_ ill V_V_7 


GACCGAATPT 


CCATCCTTCA 


CCCGGAGGGT 


480 




TAPPTGATTA 


PPPPGGAGTG 


GPTTTGGGAG 

vjl xxx yj\j\jn\j 


AAGTATGGGG 


520 
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tta anrPTTr 

1 X/\/\oLV — L XL 


PPAOTOPPTO 

L L^rVLT X VJOLj X \J 


O A PT A PPPPG 

VJJ./-VL X ALLUULJ 


PPTTGGPPGG 

LLX lOOLLuu 


J u u 




GGAPPPTTPP 


GAPAAPATPP 


PPGGPGTGAA 

LLOOLu X vjn/1 


GGGPATPGGG 


600 




PAPA APAPPG 


PPGPP A APPT 


GATPPGGGAG 


TGGGGAAGPP 


640 

VJ "I \J 


10 


TPPAAAAPPT 

X \j\jJ-\J-iJ-^r\\^ L X 


TPTTA APPAP 


PTGGAAPAGG 

L x uunnLnuvj 


TGAAAPPTGC 
x \jrmnv.L x v?l 


680 




PTPPGTGPGG 

v— 1 LLvj j. ltllivj 


GAGAAGATPP 


TTAGPPAPAT 


GGAGGAP PTP 


720 




AAGPTATPPP 

/-4-riAJL X AX LLL 


TGG A G PTTTP 


PPGGGTGPAP 

LLvjvjVj X Uvn^ 


APGGAGTTGP 


760 




PPPTTPAPPT 
LLL X X \-~r\\j\j X 


PPAPTTPPPP 
\jVjr.M.L X X LULL 


PPOPGPPPPG 

LOULuLLUUU 


AGPPGG A PPG 


flOO 

O \J VJ 
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Vjv*rt>ivavjvj^ x x 
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TOOAOAOOPT 


GGAGTTPGGA 


ft40 
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.H.VJLL X LL X l^L 


ri^uriO X X LLjvJ 


PPTOTTOO A A 

LL lull vjLrrvtt. 


A PPPPGGTGP 
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r*^2 ^ nan a on a 

Lvj^v^vjLi/\vjioi-l 


A PPTPPPTHf! 
MAjil X LLL X kjjKj 


LLvjllLLLLLt 


Alt lt L7-t4.lt L L X 1 


_? ^1 u 




LVjT X UUOU X -rlL 


X X L X X X LLL 


uLLLLbrtuLL 


LJML X LT X VjVjoLU 


-7 O U 




paoptta Apn 

Lj.rtLr L x X .M-riLLr 


pPTTnnppnp 

LL X X OOLLuL 


OOOOTOOOAO 

LULL X vjLrLx^riLt 


PPJ A A PPPTTT 






AL L V3^\jlvj\j*\ 


LibAlLLLl X Lj 


L7/iL7UL,U X X LjL 


orLTvjvjLrL X X Lr\J 


i 04 0 
X vl *± U 
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LTVJLTL X X X X LTvj 


pp a Ar:r!APPT 

L.L/inoLv\LL. X 


vjLjLLtVjj X LtL X vJJ 


1 OftO 
J_ u o u 




oCvL X K3Jr\\3\J\J 


A AOOnATTPSP 
AAlt\jIxM.X X L3L 


ppTnnp a ppfi 

LL X UuLnL LL7 


ClClC i Cl A PGA PP 
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pp a toptppt 

LL-MX 1LLI 


PPPPT A PPTP 
LwLL InLL X L 


PTfifiATPPTT 

L X L7L7/1 X LL X X 


PP A A PA PPGP 

LLxirtLriLLVJL 
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pt A CZC* P POP P 


HP T A PPnPPfi 
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flP A PtTPH A PP 
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X6 U v 




vjAVj^AvJLTL vJVJ 


L7L7VjAri/iL7v3VJL> 


GPTPPTTTPP 

\JL X VJL XXX LL 


GA A AGGPTTT 


1940 
x^t u 
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A POPPPPPPT 
-rlLULLLrLLL X 


P PTOO A CXOCICX 

LL X LtLt/^LtL VJ\J 


PTT A AGGGGG 

L X X rtri\JOVJL3\J 


AGGAGAGGPT 
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Itll 1 VjjVjL X X 


TAP^iapnapp 

X ALoAVJunuu 


TPnAA A Af^PP 


P PTTTPG PPG 

LL X X X LwLww 
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r«TP PTonppr 1 
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LHL,L7L70OL7 X A 


X LTLT X XVjTvxrlXVJ 
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X LLL X \jL7riwO 


X L7LT.ri.LTLt L- LrvJTi. 
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GCTCAGGCGC 


CTCGAGGAGG 


AGGTCCACCG 


ACTGGCCGGG 


1440 


30 


CATCCTTTCA 


ACCTGAACTC 


CCGGGACCAG 


CTGGAAAGGG 


1480 




TCCTCTTTGA 


CGAGCTTGGG 


CTTCCCGCCA 


TCGGCAAGAC 


1520 




GGAGAAGACG 


GGTAAGCGTT 


CCACCAGCGC 


CGCCGTTTTG 


1560 




GAGGCTTTGA 


GGGAGGCTCA 


TCCCATAGTG 


GACCGCATCC 


1600 
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TCCAGTACCG 


GGAGCTTTCC 


AAGCTCAAGG 


GAACGTACAT 


1640 




CGATCCCTTG 


CCCGCCCTGG 


TCCACCCCAA 


GACGAACCGC 


1680 




CTCCACACCC 


GTTTCAACCA 


GACGGCCACC 


GCCACGGGGA 

W V^ W«^W W VJ VJ W*A 


1720 




GGCTTAGCAG 


CTCGGATCCC 


AACCTGCAAA 

* V*» W* ^» VJ WAMM* 


ATATCCCCGT 


1760 


5 


GCGCACCCCT 


TTAGGCCAGC 


GGATCCGCCG 

VJA^i J» V«r V-p VJ V». V-« \J 


GGCCTTCGTG 

V7V«V>V> 1 XV_»VJXVJ 


1800 

x> w v v* 




GCCGAGGAGG 


GGTGGAGGCT 


GGTGGTTTTG 

W W J« VJ W ^ J^ J» J. VJ 


GACTACAGCC 


1840 




AGATTGAGCT 


CAGGGTCCTG 


GCGCACCTTT 


CCGGGGACGA 


1880 

x, o o v 




GAACCTGATC 


CGGGTCTTCC 


AAGAGGGCCA 


GGAPATPPAP 

VJVJ^iV^^'V X V— V-rt.V— 


1 990 

X J^ u 




ACCCAGACGG 


CCAGCTGGAT 


GTTCGGPGTG 


PGCPPAGAGn 

V*. V_ V_ V_ V^XI VJ/^ VJ VJ 


1 QfiO 

1 jOu 


10 


CCGTGGATTC 


CCTGATGCGC 


CGGGPGGPPA 


AGAPPATPAA 






CTACGGCGTC 


CTCTACGGCA 


TGTCPGPPPA 

X VJ X V- w vj v_ v* v-^/l. 


PPGGPTTTPH 

V— V- VJVJ V— XXX V— VJ 


9 04 0 

*± U 




GGAGAGCTGG 


CCATCCCCTA 

v^ v^* «^ v^ v^ v^ v^ A 


CGAGGAAGCG 


GTGGPPTTPA 

VJ X VJVJ V— V- X x v_./*\ 


2 080 

c. v O U 




TCGAGGGGTA 


TTTCCAGAGC 


TACCCCAAGG 


TACGGGCCTG 

xnv>\j\j\jv>v^ x VJ 


2120 

X ^< V/ 




GATTGAGAAA 


ACCCTGGCGG 


AAGGACGGGA 

rv^v7\jxi.v« vj vj vjx^ 


GCGGGGPTAT 

VJ V* VJVJ VJVJ v^. X f\ X 


2 1 fiO 


15 


GTGGAAACCC 


TCTTTGGCCG 


PPGGPGPT AT 


GTflPPPtt A PT 

vj x vjVs^ V_ v.urlv> x 


9 9 n n 




TGGCTTCCCG 

-Ai VJ VJ V«<* *A» JL. W V— V^ 


GGTGAAGAGC 


ATPPGGGAGG 

"1 wWVJVJVJ^\VjVJ 


p An pnn a rsprj 

vA\3 v<OvTriVjv#v7 


9 94 n 




CATGGCCTTC 

J * «At VJ Va* V^ ^ <4, V*» 


AACATGCCGG 


TPP AGGGG A P 


V-.VJV, V-VJV-VJVJ/*! X 


9 9 ft 0 




TTGATGAAAC 


TGGCCATGGT 

j- www x vjw x 


GAAGPTPTTT 

VJ/lfiVJv- X Vw X X J. 


PPPAnnPTTP 

V_ V_ V^/-iVJVJV^ XXV- 


9 9 n 




AGGAGCTGGG 


GGCCAGGATG 


CTTTTGCAGG 


TGCACGACGA 


2360 


20 


ACTGGTCCTC 


GAGGCTCCCA 


AGGAGCAAGC 


GGAGGAAGTC 


2400 




GCCCAGGAGG 


CCAAGCGGAC 


CATGGAGGAG 


GTGTGGCCGC 


2440 




TGAAGGTGCC 


CTTGGAGGTG 


GAGGTGGGTA 


TCGGGGAGGA 


2480 




CTGGCTTTCC 


GCCAAGGCCT 


AGTCGAC 
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The substitution of TAC (encoding Tyr) for TTC (encoding Phe) at the indicated 
25 positions can reduce discrimination against ddNTP incorporation by DNA 
polymerase I. See, e.g., U.S. Patent 5,614,365 that is incorporated herein by 
reference. The substitution of GAC (encoding Asp) for GGG (encoding Gly) at 
the indicated positions removes the 5'-3' exonuclease activity. 

The nucleic acids of the invention have homology to portions of the 
30 nucleic acids encoding the thermostable DNA polymerases of Thermus 

aquaticus and Thermus thermophilus {see Figure 1). However, significant 
portions of the nucleic acid sequences of the present invention are distinct. 

The invention also encompasses fragment and variant nucleic acids of 
SEQ ID NO: 1-12. Nucleic acid "fragments" encompassed by the invention are 



of two general types. First, fragment nucleic acids that do not encode a full- 
length nucleic acid polymerase but do encode a thermally stable polypeptide 
with nucleic acid polymerase activity are encompassed within the invention. 
Second, fragment nucleic acids useful as hybridization probes but that generally 
5 do not encode polymerases retaining biological activity are also encompassed 
within the invention. Thus, fragments of nucleotide sequences such as SEQ ID 
NO:l - 12 may be as small as about 9 nucleotides, about 12 nucleotides, about 
15 nucleotides, about 17 nucleotides, about 18 nucleotides, about 20 nucleotides, 
about 50 nucleotides, about 100 nucleotides or more. In general, a fragment 

10 nucleic acid of the invention can have any upper size limit so long as it is related 
in sequence to the nucleic acids of the invention but is not full length. 

As indicated above, "variants" are substantially similar or substantially 
homologous sequences. For nucleotide sequences, variants include those 
sequences that, because of the degeneracy of the genetic code, encode the 

15 identical amino acid sequence of the native nucleic acid polymerase protein. 
Variant nucleic acids also include those that encode polypeptides that do not 
have amino acid sequences identical to that of a native nucleic acid polymerase 
protein, but that encode an active, thermally stable nucleic acid polymerase with 
conservative changes in the amino acid sequence. 

20 As is laiown by one of skill in the art, the genetic code is "degenerate," 

meaning that several trinucleotide codons can encode the same amino acid. This 
degeneracy is apparent from Table 1. 



Table 1 





Second Position ' 




I st Position 


T 


C 


A 


G 


3 rd Position 


T 


TTT = Phe 


TCT = Ser 


TAT = Tyr 


TGT = Cys 


T 


T 


TTC = Phe 


TCC = Ser 


TAC = Tyr 


TGC = Cys 


C 


T 


TTA = Leu 


TCA = Ser 


TAA = Stop 


TGA = Stop 


A 


T 


TTG = Leu 


TCG = Ser 


TAG = Stop 


TGG = Trp 


G 


C 


CTT = Leu 


CCT = Pro 


CAT = His 


CGT = Arg 


T 


C 


CTC = Leu 


CCC = Pro 


CAC = His 


CGC = Arg 


C 


C 


CTA = Leu 


CCA = Pro 


CAA = Gin 


CGA = Arg 


A 


C 


CTG = Leu 


CCG = Pro 


CAG = Gin 


CGG = Arg 


G 


A 


ATT = He 


ACT = Thr 


AAT = Asn 


AGT = Ser 


T 


A 


ATC = He 


ACC = 
Thr 


AAC = Asn 


AGC = Ser 


C 
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I 

s 



A 


ATA = lie 


ACA = 
Thr 


AAA = Lys 


AGA = Arg 


A 


A 


ATG = Met 


ACG = 
Thr 


AAG = Lys 


AGG = Arg 


G 


G 


GTT = Val 


GCT = Ala 


GAT = Asp 


GGT = Gly 


T 


G 


GTC = Val 


GCC = 
Ala 


G AC = Asp 


GGC = Gly 


C 


G 


GTA = Val 


GCA = 
Ala 


G AA = Gin 


GGA = Gly 


A 


G 


GTG = Val 


GCG = 
Ala 


GAG = Gin 


GGG = Gly 


G 



Hence, many changes in the nucleotide sequence of the variant may be silent and 
may not alter the amino acid sequence encoded by the nucleic acid. Where 
nucleic acid sequence alterations are silent, a variant nucleic acid will encode a 
5 polypeptide with the same amino acid sequence as the reference nucleic acid. 
Therefore, a particular nucleic acid sequence of the invention also encompasses 
variants with degenerate codon substitutions, and complementary sequences 
thereof, as well as the sequence explicitly specified by a SEQ ID NO. 
Specifically, degenerate codon substitutions may be achieved by generating 

10 sequences in which the reference codon is replaced by any of the codons for the 
amino acid specified by the reference codon. In general, the third position of 
one or more selected codons can be substituted with mixed-base and/or 
deoxyinosine residues as disclosed by Batzer et aL, Nucleic Acid Res., 19, 5081 
(1991) and/or Ohtsuka et aL, J. Biol. Chem., 260, 2605 (1985); Rossolini et aL, 

15 Mol. Cell. Probes, 8, 91 (1994). 

However, the invention is not limited to silent changes in the present 
nucleotide sequences but also includes variant nucleic acid sequences that 
conservatively alter the amino acid sequence of a polypeptide of the invention. 
According to the present invention, variant and reference nucleic acids of the 

20 invention may differ in the encoded amino acid sequence by one or more 

substitutions, additions, insertions, deletions, fusions and truncations, which may 
be present in any combination, so long as an active, thermally stable nucleic acid 
polymerase is encoded by the variant nucleic acid. Such variant nucleic acids 
will not encode exactly the same amino acid sequence as the reference nucleic 

25 acid, but have conservative sequence changes. 

Variant nucleic acids with silent and conservative changes can be defined 
and characterized by the degree of homology to the reference nucleic acid. 
Preferred variant nucleic acids are "substantially homologous" to the reference 

40 



nucleic acids of the invention. As recognized by one of skill in the art, such 
substantially similar nucleic acids can hybridize under stringent conditions with 
the reference nucleic acids identified by SEQ ID NOs herein. These types of 
substantially homologous nucleic acids are encompassed by this invention. 
5 Generally, nucleic acid derivatives and variants of the invention will have 

at least 90%, 91%, 92%, 93% or 94% sequence identity to the reference 
nucleotide sequence defined herein. Preferably, nucleic acids of the invention 
will have at least at least 95%, 96%, 97%, 98%, or 99% sequence identity to the 
reference nucleotide sequence defined herein. 

10 Variant nucleic acids can be detected and isolated by standard 

hybridization procedures. 

Hybridization to detect or isolate such sequences is generally carried out 
under stringent conditions. "Stringent hybridization conditions" and "stringent 
hybridization wash conditions" in the context of nucleic acid hybridization 

1 5 . experiments such as Southern and Northern hybridization are sequence 

dependent, and are different under different environmental parameters. Longer 
sequences hybridize specifically at higher temperatures. An extensive guide to 
the hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in 
Biochemistry and Molecular biology-Hybridization with Nucleic Acid Probes, 

20 page 1, chapter 2 "Overview of principles of hybridization and the strategy of 
nucleic acid probe assays" Elsevier, New York (1993). See also, J. Sambrook et 
aL, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Press, N.Y., 
pp 9.31-9.58 (1989); J. Sambrook et al., Molecular Cloning: A Laboratory 
Manual, Cold Spring Harbor Press, N.Y. (3rd ed. 2001). 

25 The invention also provides methods for detection and isolation of 

derivative or variant nucleic acids encoding nucleic acid polymerase activity. 
The methods involve hybridizing at least a portion of a nucleic acid comprising 
any one of SEQ ID NO: 1-12 to a sample nucleic acid, thereby forming a 
hybridization complex; and detecting the hybridization complex. The presence 

30 of the complex correlates with the presence of a derivative or variant nucleic 
acid encoding at least a segment of nucleic acid polymerase. In general, the 
portion of a nucleic acid comprising any one of SEQ ID NO: 1-12 used for 
hybridization is at least fifteen nucleotides, and hybridization is under 
hybridization conditions that are sufficiently stringent to permit detection and 

35 isolation of substantially homologous nucleic acids. In an alternative 

embodiment, a nucleic acid sample is amplified by the polymerase chain 
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reaction using primer oligonucleotides selected from any one of SEQ ID NO: 1- 
12. 

Generally, highly stringent hybridization and wash conditions are 
selected to be about 5 °C lower than the thermal melting point (T m ) for the 
5 specific double-stranded sequence at a defined ionic strength and pH. For 
example, under "highly stringent conditions" or "highly stringent hybridization 
conditions" a nucleic acid will hybridize to its complement to a detectably 
greater degree than to other sequences (e.g., at least 2- fold over background). 
By controlling the stringency of the hybridization and/or washing conditions, 

10 nucleic acids that are 100% complementary can be identified. 

' Alternatively, stringency conditions can be adjusted to allow some 
mismatching in sequences so that lower degrees of similarity are detected 
(heterologous probing). Typically, stringent conditions will be those in which 
the salt concentration is less than about 1,5 M Na ion, typically about 0.01 to 1 .0 

15 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at 
least about 30°C for short probes (e.g., 10 to 50 nucleotides) and at least about 
60°C for long probes (e.g., greater than 50 nucleotides). Stringent conditions 
may also be achieved with the addition of destabilizing agents such as 
formamide. 

20 Exemplary low stringency conditions include hybridization with a buffer 

solution of 30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl 
sulphate) at 37°C, and a wash in IX to 2X SSC (20X SSC - 3.0 M NaCl and 0.3 
M trisodium citrate) at 50 to 55°C. Exemplary moderate stringency conditions 
include hybridization in 40 to 45% formamide, 1.0 M NaCl, 1% SDS at 37°C, 

25 and a wash in 0.5X to IX SSC at 55 to 60°C. Exemplary high stringency 
conditions include hybridization in 50% formamide, 1 M NaCl, 1% SDS at 
37°C, and a wash in 0. IX SSC at 60 to 65°C. 

The degree of complementarity or homology of hybrids obtained during 
hybridization is typically a function of post-hybridization washes, the critical 

30 factors being the ionic strength and temperature of the final wash solution. The 
type and length of hybridizing nucleic acids also affects whether hybridization 
will occur and whether any hybrids formed will be stable under a given set of 
hybridization and wash conditions. For DNA-DNA hybrids, the T m can be 
approximated from the equation of Meinkoth and Wahl Anal. Biochem. 

35 138:267-284 (1984); T m 81.5°C + 16.6 (log M) +0.41 (%GC) - 0.61 (% form) - 
500/L; where M is the molarity of monovalent cations, %GC is the percentage of 
guanosine and cytosine nucleotides in the DNA, % form is the percentage of 
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formamide in the hybridization solution, and L is the length of the hybrid in base 
pairs. The is the temperature (under defined ionic strength and pH) at which 
50% of a complementary target sequence hybridizes to a perfectly matched 
probe. 

5 Very stringent conditions are selected to be equal to the T m for a particular 
probe. 

An example of stringent hybridization conditions for hybridization of 
complementary nucleic acids that have more than 100 complementary residues 
on a filter in a Southern or Northern blot is 50% formamide with 1 mg of heparin 

10 at 42 °C, with the hybridization being carried out overnight. An example of 
highly stringent conditions is 0.1 5 M NaCl at 72 °C for about 15 minutes. An 
example of stringent wash conditions is a 0.2x SSC wash at 65 °C for 15 minutes 
(see also, Sambrook, infra). Often, a high stringency wash is preceded by a low 
stringency wash to remove background probe signal. An example of medium 

1 5 stringency for a duplex of, e.g., more than 100 nucleotides, is lx SSC at 45 °C 
for 15 minutes. An example low stringency wash for a duplex of, e.g., more 
than 100 nucleotides, is 4-6x SSC at 40°C for 15 minutes. For short probes 
(e.g., about 10 to 50 nucleotides), stringent conditions typically involve salt 
concentrations of less than about 1.0M Na ion, typically about 0.01 to 1.0 M Na 

20 ion concentration (or other salts) at pH 7.0 to 8.3, and the temperature is 
typically at least about 30°C. 

Stringent conditions can also be achieved with the addition of 
destabilizing agents such as formamide. In general, a signal to noise ratio of 2x 
(or higher) than that observed for an unrelated probe in the particular 

25 hybridization assay indicates detection of a specific hybridization. Nucleic acids 
that do not hybridize to each other under stringent conditions are still 
substantially identical if the proteins that they encode are substantially identical. 
This occurs, e.g., when a copy of a nucleic acid is created using the maximum 
codon degeneracy permitted by the genetic code. 

30 The following are examples of sets of hybridization/wash conditions that 

may be used to detect and isolate homologous nucleic acids that are substantially 
identical to reference nucleic acids of the present invention: a reference 
nucleotide sequence preferably hybridizes to the reference nucleotide sequence 
in 7% sodium dodecyl sulfate (SDS), 0.5 M NaP0 4 , 1 mM EDTA at 50°C with 

35 washing in 2X SSC, 0.1% SDS at 50°C, more desirably in 7% sodium dodecyl 
sulfate (SDS), 0.5 M NaP0 4 , 1 mM EDTA at 50°C with washing in IX SSC, 
0.1% SDS at 50°C, more desirably still in 7% sodium dodecyl sulfate (SDS), 0.5 
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M NaP0 4 , 1 mM EDTA at 50°C with washing in 0.5X SSC, 0.1% SDS at 50°C, 
preferably in 7% sodium dodecyl sulfate (SDS), 0.5 M NaP0 4 , 1 mM EDTA at 
50°C with washing in 0.1X SSC, 0.1% SDS at 50°C, more preferably in 7% 
sodium dodecyl sulfate (SDS), 0.5 M NaP0 4> 1 mM EDTA at 50°C with 
5 washing in 0.1X SSC, 0.1% SDS at 65°C. 

In general, T m is reduced by about 1°C for each 1% of mismatching. 
Thus, T m , hybridization, and/or wash conditions can be adjusted to hybridize to 
sequences of the desired sequence identity. For example, if sequences with 
>90% identity are sought, the T m can be decreased 10°C. Generally, stringent 

10 conditions are selected to be about 5°C lower than the thermal melting point 

(T m ) for the specific sequence and its complement at a defined ionic strength and 
pH. However, severely stringent conditions can utilize a hybridization and/or 
wash at 1, 2, 3, or 4°C lower than the thermal melting point (T m ); moderately 
stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10°C 

1 5 lower than the thermal melting point (T m ); low stringency conditions can utilize 
a hybridization and/or wash at 1 1, 12, 13, 14, 15, or 20°C lower than the thermal 
melting point (T m ). 

If the desired degree of mismatching results in a T m of less than 45°C 
(aqueous solution) or 32°C (formamide solution), it is preferred to increase the 

20 SSC concentration so that a higher temperature can be used. An extensive guide 
to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory 
Techniques in Biochemistry and Molecular Biology-Hybridization with Nucleic 
Acid Probes, Part 1, Chapter 2 (Elsevier, New York); and Ausubel et al., eds. 
(1995) Current Protocols in Molecular Biology, Chapter 2 (Greene Publishing 

25 and Wiley - Interscience, New York). See Sambrook et al. (1989) Molecular 
Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor Laboratory Press, 
Plainview, New York). Using these references and the teachings herein on the 
relationship between T m , mismatch, and hybridization and wash conditions, 
those of ordinary skill can generate variants of the present nucleic acid 

30 polymerase nucleic acids. 

Computer analyses can also be utilized for comparison of sequences to 
determine sequence identity. Such analyses include, but are not limited to: 
CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain 
View, California); the ALIGN program (Version 2.0) and GAP, BESTFTT, 

35 BLAST, FASTA, and TFASTA in the Wisconsin Genetics Software Package, 
Version 8 (available from Genetics Computer Group (GCG), 575 Science Drive, 
Madison, Wisconsin, USA). Alignments using these programs can be performed 
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using the default parameters. The CLUSTAL program is well described by 
Higgins et al. Gene 73:237 244 (1988); Higgins et al. CABIOS 5:151-153 
(1989); Corpet et al. Nucleic Acids Res. 16:10881-90 (1988); Huang et al. 
CABIOS 8:155-65 (1992); and Pearson et al. Meth. Mol. Biol. 24:307-331 
5 (1994). The ALIGN program is based on the algorithm of Myers and Miller, 
supra. The BLAST programs of Altschul et al., J. MoL Biol. 215:403 (1990), 
are based on the algorithm of Karlin and Altschul supra. To obtain gapped 
alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be 
utilized as described in Altschul et al. Nucleic Acids Res. 25:3389 (1997). 

10 Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated 
search that detects distant relationships between molecules. See Altschul et al., 
supra. When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default 
parameters of the respective programs (e.g. BLASTN for nucleotide sequences, 
BLASTX for proteins) can be used. The BLASTN program (for nucleotide 

15 sequences) uses as defaults a wordlength (W) of 1 1, an expectation (E) of 10, a 
cutoff of 100, M = 5, N = -4, and a comparison of both strands. For amino acid 
sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an 
expectation (E) of 10, and the BLOSUM62 scoring matrix (see HenikofT& 
Henikoff, Proc. Natl. Acad. Sci. USA, 89, 10915 (1989)). See 

20 http://www.ncbi.nlm.nih.gov. Alignment may also be performed manually by 
inspection. 

For purposes of the present invention, comparison of nucleotide 
sequences for determination of percent sequence identity to the nucleic acid 
polymerase sequences disclosed herein is preferably made using the BlastN 

25 program (version 1 .4.7 or later) with its default parameters or any equivalent 
program. By "equivalent program" is intended any sequence comparison 
program that, for any two sequences in question, generates an alignment having 
identical nucleotide or amino acid residue matches and an identical percent 
sequence identity when compared to the corresponding alignment generated by 

30 the preferred program. 

Expression of Nucleic Acids Encoding Polymerases 

Nucleic acids of the invention may be used for the recombinant 
expression of the nucleic acid polymerase polypeptides of the invention. 
35 Generally, recombinant expression of a nucleic acid polymerase polypeptide of 
the invention is effected by introducing a nucleic acid encoding that polypeptide 
into an expression vector adapted for use in particular type of host cell. The 
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nucleic acids of the invention can be introduced and expressed in any host 
organism, for example, in both prokaryotic or eukaryotic host cells. Examples of 
host cells include bacterial cells, yeast cells, cultured insect cell lines, and 
cultured mammalian cells lines. Preferably, the recombinant host cell system is 
5 selected that processes and post-translationally modifies nascent polypeptides in 
a manner similar to that of the organism from which the nucleic acid polymerase 
was derived. For purposes of expressing and isolating nucleic acid Polymerase 
polypeptides of the invention, prokaryotic organisms are preferred, for example, 
Escherichia coli. Accordingly, the invention provides host cells comprising the 

1 0 expression vectors of the invention. 

The nucleic acids to be introduced can be conveniently placed in 
expression cassettes for expression in an organism of interest. Such expression 
cassettes will comprise a transcriptional initiation region linked to a nucleic acid 
of the invention. Expression cassettes preferably also have a plurality of 

15 restriction sites for insertion of the nucleic acid to be under the transcriptional 
regulation of various control elements. The expression cassette additionally may 
contain selectable marker genfes. Suitable control elements such as 
enhancers/promoters, splice junctions, polyadenylation signals, etc. maybe 
placed in close proximity to the coding region of the gene if needed to permit 

20 proper initiation of transcription and/or correct processing of the primary RNA 
transcript. Alternatively, the coding region utilized in the expression vectors of 
the present invention may contain endogenous enhancers/promoters, splice 
junctions, intervening sequences, polyadenylation signals, etc., or a combination 
of both endogenous and exogenous control elements. 

25 Preferably the nucleic acid in the vector is under the control of, and 

operably linked to, an appropriate promoter or other regulatory elements for 
transcription in a host cell. The vector may be a bi-functional expression vector 
that functions in multiple hosts. The transcriptional cassette generally includes 
in the 5-3' direction of transcription, a promoter, a transcriptional and 

30 translational initiation region, a DNA sequence of interest, and a transcriptional 
and translational termination region functional in the organism. The termination 
region may be native with the transcriptional initiation region, may be native 
with the DNA sequence of interest, or may be derived from another source. 
Efficient expression of recombinant nucleic acids in prokaryotic and 

35 eukaryotic cells generally requires regulatory control elements directing the 
efficient termination and polyadenylation of the resulting transcript. 
Transcription termination signals are generally found downstream of the 
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polyadenylation signal and are a few hundred nucleotides in length. The term 
"poly A site" or "poly A sequence" as used herein denotes a nucleic acid 
sequence that directs both the termination and polyadenylation of the nascent 
RNA transcript. Efficient polyadenylation of the recombinant transcript is 
5 desirable as transcripts lacking a poly A tail are unstable and are rapidly 
degraded. 

Nucleic acids encoding nucleic acid polymerase may be introduced into 
bacterial host cells by a method known to one of skill in the art. For example, 
nucleic acids encoding a thermophilic nucleic acid polymerase can be introduced 

1 0 into bacterial cells by commonly used transformation procedures such as by 
treatment with calcium chloride or by electroporation. If the thermophilic 
nucleic acid polymerase is to be expressed in eukaryotic host cells, nucleic acids 
encoding the thermophilic nucleic acid polymerase may be introduced into 
eukaryotic host cells by a number of means including calcium phosphate co- 

15 precipitation, spheroplast fusion, electroporation and the like. When the 

eukaryotic host cell is a yeast cell, transformation may be affected by treatment 
of the host cells with lithium acetate or by electroporation. 

Thus, one aspect of the invention is to provide expression vectors and 
host cells comprising a nucleic acid encoding a nucleic acid polymerase 

20 polypeptide of the invention. A wide range of expression vectors are well 

known in the art. Description of various expression vectors and how to use them 
, can be found among other places in U.S. Pat. Nos. 5,604,1 18; 5,583,023; 
5,432,082; 5,266,490; 5,063,158; 4,966,841; 4,806,472; 4,801,537; and Goedel 
et al., Gene Expression Technology, Methods of Enzymology, Vol. 185, 

25 Academic Press, San Diego (1989). The expression of nucleic acid polymerases 
in recombinant cell systems is a well-established technique. Examples of the 
recombinant expression of nucleic acid polymerase can be found in U.S. Pat. 
Nos. 5,602,756; 5,545,552; 5,541,311; 5,500,363; 5,489,523; 5,455,170; 
5,352,778; 5,322,785; and 4,935,361 . 

30 Recombinant DNA and molecular cloning techniques that can be used to 

help make and use aspects of the invention are described by Sambrook et al., 
Molecular Cloning: A Laboratory Manual Vol. 1-3, Cold Spring Harbor 
laboratory, Cold Spring Harbor, N.Y. (2001); Ausubel (ed.), Current Protocols in 
Molecular Biology, John Wiley and Sons, Inc. (1994); T. Maniatis, E. F. Fritsch 

35 and J. Sambrook, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor 
laboratory, Cold Spring Harbor, N.Y. (1989); and by T. J. Silhavy, M.L. 
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Berman, and L. W. Enquist, Experiments with Gene Fusions, Cold Spring 
Harbor Laboratory, Cold Spring Harbor, N.Y. (1984). 

Nucleic Acid Polymerase Enzymes 

5 The invention provides Thermus scotoductus nucleic acid polymerase 

polypeptides, as well as fragments thereof and variant nucleic acid Polymerase 
polypeptides that are active and thermally stable. Any polypeptide containing 
amino acid sequence having any one of SEQ ID NO: 13-28, which are the amino 
acid sequences for wild type and derivative Thermus scotoductus nucleic acid 

1 0 polymerases, are contemplated by the present invention. The polypeptides of the 
invention are isolated or substantially purified polypeptides. In particular, the 
isolated polypeptides of the invention are substantially free of proteins normally 
present in Thermus scotoductus bacteria. 

In one embodiment, the invention provides a polypeptide of SEQ ID 

1 5 NO: 13, a wild type Thermus scotoductus nucleic acid polymerase polypeptide 
from strain X-l with three additional amino acids at the N-terminus: 
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15 



30 



35 
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In another embodiment, the invention provides SEQ ID NO: 14, a wild type 
Thermus scotoductus nucleic acid polymerase enzyme, from strain X-l that does 
25 not have the three additional amino acids at the N-terminus that are present in 
SEQ ID NO:13. SEQ ID NO:14 is provided below. 
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EEAGERALLS 


ERLYAALLER 


LKGEERLLWL 


YEEVEKPLSR 


440 


VLAHMEATGV 


RLDVAYLKAL 


SLEVEAELRR 


LEEEVHRLAG 


480 


HPFNLNSRDQ 


LERVLFDELG 


LPAIGKTEKT 


GKRSTSAAVL 


520 


EALREAHPIV 


DRILQYRELS 


KLKGTYIDPL 


PALVHPKTNR 


560 


LHTRFNQTAT 


ATGRLSSSDP 


NLQNIPVRTP 


LGQRIRRAFV 


600 


AEEGWRLWL 


DYSQIELRVL 


AHLSGDENLI 


RVFQEGQDIH 


640 


TQTASWMFGV 


PPEAVDSLMR 


RAAKT I NFGV 


LYGMSAHRLS 


680 


GELAIPYEEA 


VAFIERYFQS 


YPKVRAWIEK 


TLAEGRERGY 


72 0 


VETLFGRRRY 


VPDLASRVKS 


IREAAERMAF 


NMPVQGTAAD 


760 


LMKLAMVKLF 


PRLQELGARM 


LLQVHDELVL 


EAPKEQAEEV 


800 


AQEAKRTMEE 


VWPLKVPLEV 


EVGIGEDWLS 


AKA 


833 



10 



In another embodiment, the invention provides SEQ ED NO: 15, a wild type 
Thermus scotoductus nucleic acid polymerase enzyme from strain SM3. SEQ 
15 ID NO: 1 5 is provided below. 

MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 4 0 
VQAVYGFAKS LLKALREDGD WIWFDAKA PS FRHQT YEA 8 0 

YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 12 0 

20 VLATLAKKAE KEGYEVRILT ADRDLYQLLS DRISILHPEG 160 

YLITPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 2 00 

EKTAAKL I RE WGSLENLLKH LEQVKPASVR EKILSHMEDL 24 0 

KLSLELSRVH TELPLQVDFA RRREPDREGL KAFLERLEFG 280 

SLLHEFGLLE SPVAAEEAPW PPPEGAFVGY VXiSRPEPMWA 32 0 

25 ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVIj 360 

ALREGIALAQ GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 4 00 

EEAGERALLS ERLYAALLER LKGEERLLWL YEEVEKPLSR 44 0 

VLAHMEATGV WLDVAYLKAL SLEVEAELRR LEEEVHRLAG 480 

HPFNLNSRDQ LERVLFDELG LPAIGKTEKT GKRSTSAAVL 520 

30 EALREAHPIV DRILQYRELS KLKGTYIDPL PALVHPKTNR 560 

LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQRIRRAFV 600 

AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 640 

TQTASWMFGV PPEAVDSLMR RAAKTINFGV LYGMSAHRLS 680 

GELAIPYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

35 VETLFGRRRY VPDLASRVKS IREAAERMAF NMPVQGTAAD 760 

LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 

AQEAKRTMEE VWPLKVPLEV EVGIGEDWLS AKA 833 

In another embodiment, the invention provides SEQ ID NO: 16, a wild type 
40 Thermus scotoductus nucleic acid polymerase enzyme from strain Vi7a. SEQ 
ID NO: 1 6 is provided below. 



MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 4 0 
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VQAVYGFAKS LLKALREDGD WIWFDAKA PSFRHQTYEA 80 

YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 120 

VLATLAKKAE KEGYEVRILT ADRDLYQLLS DRISILHPEG 160 

YLITPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 200 

5 EKTAAKLIRE WGSLENLLKH LEQVKPASVR EKILSHMEDL 240 

KLSLELSRVH TELPLQVDFA RRREPDREGL KAFLERLEFG 280 

SLLHEFGLLE SPVAAEEAPW PPPEGAFVGY VLSRPEPMWA 320 

ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVL 360 

ALREG I ALAP GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 4 00 

10 EEAGERALLS ERLYAALLER LKGEERLLWL YEEVEKPLSR 440 

VLAHMEATGV WLDVAYLKAL SLEVEAELRR LEEEVHRLAG 480 

HPFNLNSRDQ LERVLFDELG LPAIGKTEKT GKRSTSAAVL 52 0 

EALREAHPIV DRILQYRELS KLKGTYIDPL PALVHPKTNR 560 

LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQRIRRAFV 600 

15 AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 64 0 

TQTASWMFGV PPEAVDSLMR RAAKTINFGV LYGMSAHRLS 680 

GELAI PYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

VETLFGRRRY VPDLASRVKS IREAAERMAF NMPVQGTAAD 760 

LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 

20 AQEAKRTMEE VWPLKVPLEV EVGI GEDWLS AKA 833 

The sequences of wild type Thermus scotoductus nucleic acid 
polymerases are distinct from the amino acid sequence of Thermus aquaticus 
DNA Polymerase. There are about 51 conservative amino acid differences and 

25 about 62 nonconservative amino acid differences. For example, one region of 
dissimilarity is between approximate amino acid positions 51 and 65, where the 
sequence of the Thermus scotoductus polymerase has about four amino acid 
differences (in bold): LLKALREDG DWIVVFDAK APSFRHQTYE (SEQ 
ID NO:39). Another region of dissimilarity is between approximate amino acid 

30 positions 201 and 236, where the sequence of the Thermus scotoductus 
polymerase has about seven amino acid differences (in bold): 
GEKTAAKLIREWGSLENLLKHLEQV KPASV REKILS (SEQ ID NO:40). 
Another region of dissimilarity is between about positions 311 and 350, where 
the sequence of the Thermus scotoductus polymerase has about seven amino acid 

35 changes (in bold): VGYVLSRPEPMWAELN 

ALAAAWEGRVYRAEDPLEALRGLG (SEQ ID NO:41). Another region of 
dissimilarity is between about positions 415 and 435, where the sequence of the 
Thermus scotoductus polymerase has about five amino acid changes (in bold): 
RLYAALLERLKGEERLLWLYE (SEQ ID NO:42). Another region of 

40 dissimilarity is between about positions 53 1 and 562, where the sequence of the 
Thermus scotoductus polymerase has about six amino acid changes (in bold): 
PIVDRILQYRELSKLK GTYID PLPALVHPKTN (SEQ ID NO:43). Another 
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region of dissimilarity is between about positions 801 and 836, where the 
sequence of the Thermus scotoductus polymerase has about eight amino acid 
changes (in bold): EEVAQEAKRT MEEVWPLKVPLEVEVGIGEDWLSAKA 
(SEQ ID NO:44). Hence, many regions of the Thermus scotoductus polymerase 
differ from the Thermus aquaticus and Thermus thermophilus DNA 
polymerases. 

Many DNA polymerases possess activities in addition to a DNA 
polymerase activity. Such activities include, for example, a 5 % ~y exonuclease 
activity and/or a 3'-5' exonuclease activity. The y-5' exonuclease activity 
improves the accuracy of the newly synthesized strand by removing incorrect 
bases that may have been incorporated. DNA polymerases in which such 
activity is low or absent are prone to errors in the incorporation of nucleotide 
residues into the primer extension strand. Taq DNA polymerase has been 
reported to have low 3'-5* exonuclease activity. See Lawyer et al., J. Biol Chem. 
264:6427-6437. In applications such as nucleic acid amplification procedures in 
which the replication of DNA is often geometric in relation to the number of 
primer extension cycles, such errors can lead to serious artifactual problems such 
as sequence heterogeneity of the nucleic acid amplification product (amplicon). 
Thus, a 3'-5' exonuclease activity is a desired characteristic of a thermostable 
DNA polymerase used for such purposes. 

By contrast, the 5^3* exonuclease activity of DNA polymerase enzymes 
is often undesirable because this activity may digest nucleic acids, including 
primers, that have an unprotected 5 f end. Thus, a thermostable nucleic acid 
polymerase with an attenuated 5'-3' exonuclease activity, or in which such 
activity is absent, is a desired characteristic of an enzyme for biochemical 
applications. Various DNA polymerase enzymes have been described where a 
modification has been introduced in a DNA polymerase that accomplishes this 
object. For example, the Klenow fragment of E. coli DNA polymerase I can be 
produced as a proteolytic fragment of the holoenzyme in which the domain of 
the protein controlling the 5 % -V exonuclease activity has been removed. The 
Klenow fragment still retains the polymerase activity and the 3 f -5' exonuclease 
activity. Barnes, PCT Publication No. WO92/06188 (1992) and Gelfand et al., 
U.S. Pat. No. 5,079,352 have produced 5'-3' exonuclease-deficient recombinant 
Thermus aquaticus DNA polymerases. Ishino et al., EPO Publication No. 
0517418A2, have produced a 5'-3' exonuclease-deficient DNA polymerase 
derived from Bacillus caldotenax. 

In another embodiment, the invention provides a polypeptide that is a 
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derivative Thermus scotoductus polypeptide with reduced or eliminated 5'~y 
exonuclease activity. Several methods exist for reducing this activity, and the 
invention contemplates any polypeptide derived from the Thermus scotoductus 
polypeptides of the invention that has reduced or eliminated such 5-3' 
5 exonuclease activity. Xu et al., Biochemical and mutational studies of the 5-3' 
exonuclease of DNA polymerase I of Escherichia colL J. Mol. Biol. 1997 May 
2; 268(2):284-302. 

In one embodiment, the invention provides a Thermus scotoductus 
nucleic acid polymerase polypeptide from strain X-l in which Asp is used in 
10 place of Gly at position 46. This polypeptide has SEQ ID NO: 17 and reduced 5'- 
3' exonuclease activity. SEQ ID NO: 17 is provided below. 
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35 In another embodiment, the invention provides a Thermus scotoductus 

nucleic acid polymerase polypeptide from strain X-l in which Asp is used in 
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place of Gly at position 46. This polypeptide has SEQ ID NO: 18 and reduced 5'- 
3 f exoniiclease activity. SEQ ID NO: 18 is provided below. 
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In another embodiment, the invention provides a Thermus scotoductus 
nucleic acid polymerase polypeptide from strain SM3 in which Asp is used in 
place of Gly at position 46. This polypeptide has SEQ ID NO: 19 and reduced 5'- 
3' exonuclease activity. SEQ ID NO:19 is provided below. 

30 

MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 40 
VQAVYDFAKS LLKALREDGD WIWFDAKA PSFRHQTYEA 80 
YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 120 
VLATLAKKAE KEGYEVRILT ADRDLYQLLS DRISILHPEG 160 

35 YLITPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 200 
EKTAAKLI RE WGSLENLLKH LEQVKPASVR EKILSHMEDL 240 
KLSLELSRVH TELPLQVDFA RRREPDREGL KAFLERLEFG 280 
SLLHEFGLLE SPVAAEEAPW PPPEGAFVGY VLSRPEPMWA 320 
ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVL 360 

40 ALREGIALAQ GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 400 
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EEAGERALLS ERLYAALLER , LK6EERLLWL YEEVEKPLSR 440 

VLAHMEATGVnWLDVAYLKAL SLEVEAELRR LEEEVHRLAG 480 

HPFNLNSRDQ LERVLFDELG LPAIGKTEKT GKRSTSAAVL 520 

EALREAHPIV DRILQYRELS KLKGTYIDPL PALVHPKTNR 560 

5 LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQRIRRAFV 600 

AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 64 0 

TQTASWMFGV PPEAVDSLMR RAAKTINFGV LYGMSAHRLS 680 

GELAIPYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

VETLFGRRRY VPDLASRVKS IREAAERMAF NMPVQGTAAD 760 

10 LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 

AQEAKRTMEE VWPLKVPLEV EVGIGEDWLS AKA 833 

In another embodiment, the invention provides a Thermus scotoductus 
nucleic acid polymerase polypeptide from strain Vi7a in which Asp is used in 
1 5 place of Gly at position 46. This polypeptide has SEQ ID NO:20 and reduced 5'- 
3' exonuclease activity. SEQ ID NO:20 is provided below. 



MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 4 0 

20 VQAVYDFAKS LLKALREDGD WIWFDAKA PSFRHQTYEA 80 

YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 120 

VLATLAKKAE KEGYEVRILT ADRDLYQLLS DRISILHPEG 160 

YliI TPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 200 

EKTAAKLIRE WGSLENLLKH LEQVKPASVR EKILSHMEDL 240 

25 KLSLELSRVH TELPLQVDFA RRREPDREGL KAFLERLEFG 280 

S LLHE FGLLE SPVAAEEAPW PPPEGAFVGY VLSRPEPMWA 320 

ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVL 360 

ALREG I ALAP GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 400 

EEAGERALLS ERLYAALLER LKGEERLLWL YEEVEKPLSR 440 

30 VLAHMEATGV WLDVAYLKAL SLEVEAELRR LEEEVHRLAG 480 

HPFNLNSRDQ LERVLFDELG LPAIGKTEKT GKRSTSAAVL 520 

EALREAHPIV DRILQYRELS KLKGTYIDPL PALVHPKTNR 560 

LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQRIRRAFV 600 

AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 640 

35 TQTASWMFGV PPEAVDSLMR RAAKTINFGV LYGMSAHRLS 680 

GELAIPYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

VETLFGRRRY VPDLASRVKS IREAAERMAF NMPVQGTAAD 760 

LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 

AQEAKRTMEE VWPLKVPLEV EVGIGEDWLS AKA 833 

40 

In another embodiment, the invention provides a polypeptide of SEQ ID 
NO:21, a derivative Thermus scotoductus polypeptide from strain X-l with 
reduced bias against ddNTP incorporation. SEQ ID NO:21 has Tyr in place of 
Phe at position 668. The sequence of SEQ ID NO:21 is below. 

45 
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MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 40 
VQAVYGFAKS LLKALREDGD WIWFDAKA PSFRHQTYEA 80 

YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 120 

VLATLAKKAE KEGYEVRILT ADRDLYQLLS ERISILHPEG 160 

5 YLITPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 200 

EKTAAKLIRE WGSLENLLKH LEQVKPASVR EKILSHMEDL 240 

KLSLELSRVR TDLPLQVDFA RRREPDREGL KAFLERLEFG 280 

SLLHEFGLLE SPVAAEEAPW PPPEGAFVGY VLSRPEPMWA 320 

ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVL 360 

10 ALREG I ALAP GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 400 

EEAGERALLS ERLYAALLER LKGEERLLWL YEEVEKPLSR 440 

VLAHMEATGV RLDVAYLKAL SLEVEAELRR LEEEVHRLAG 4 80 

HPFNLNSRDQ LERVLFDELG LPAI GKTEKT GKRSTSAAVL 520 

EALREAHPIV DRILQYRELS KLKGTYIDPL PALVH PKTNR 560 

15 LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQR I RRAFV 600 

AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 640 

TQTASWMFGV PPEAVDSLMR RAAKT INYGV LYGMSAHRLS 680 

GELAI PYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

VETLFGRRRY VPDLASRVKS IREAAERMAF NMPVQGTAAD 760 

20 LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 - 

AQEAKRTMEE VWPLKVPLEV EVGIGEDWLS AKA 833 

In another embodiment, the invention provides a polypeptide of SEQ ID 
NO:22, a derivative Themvus scotoductus polypeptide from strain X-l with 
25 reduced bias against ddNTP incorporation. SEQ ID NO:22 has Tyr in place of 
Phe at position 668. The sequence of SEQ ID NO:22 is below. 



MLPLFEP 


KGRVLLVDGH 


HLAYRTFFAL 


KGLTTSRGEP 


40 


VQAVYGFAKS 


LLKALREDGD 


WIWFDAKA 


PSFRHQTYEA 


80 


YKAGRAPTPE 


DFPRQLALIK 


EMVDLLGLER 


LEVPGFEADD 


120 


VLATLAKKAE 


KEGYEVRILT 


ADRDLYQLLS 


ERISILHPEG 


160 


YLITPEWLWE 


KYGLKPSQWV 


DYRALAGDPS 


DNIPGVKGIG 


200 


EKTAAKLIRE 


WGSLENLLKH 


LEQVKPASVR 


EKILSHMEDL 


240 


KLSLELSRVR TDLPLQVDFA RRREPDREGL 


KAFLERLEFG 


280 


SLLHEFGLLE 


SPVAAEEAPW 


PPPEGAFVGY 


VLSRPEPMWA 


320 


ELNALAAAWE 


GRVYRAEDPL 


EALRGLGEVR 


GLLAKDLAVL 


360 


ALREGIALAP 


GDDPMLLAYL 


LDPSNTAPEG 


VARRYGGEWT 


400 
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EEAGERALLS 


ERLYAALLER 


LKGEERLLWL 


YEEVEKPLSR 


440 


VLAHMEATGV 


RLDVAYLKAL 


SLEVEAELRR 


LEEEVHRLAG 


480 


HPFNLNSRDQ 


LERVLFDELG 


LPAIGKTEKT 


GKRS TSAAVL 


520 


EALREAHPIV 


DRILQYRELS 


KLKGTYIDPL 


PALVHPKTNR 


560 


LHTRFNQTAT 


ATGRLSSSDP 


NLQNIPVRTP 


LGQRIRRAFV 


600 


AEEGWRLWL 


DYSQIELRVL 


AHLSGDENLI 


RVFQEGQDIH 


640 


TQTASWMFGV 


PPEAVDSLMR 


RAAKTINYGV 


LYGMSAHRLS 


680 


GELAI PYEEA 


VAFIERYFQS 


YPKVRAWIEK 


TLAEGRERGY 


720 


VETLFGRRRY 


VPDLASRVKS 


I REAAERMAF 


NMPVQGTAAD 


760 


LMKLAMVKLF 


PRLQELGARM 


LLQVHDELVL 


EAPKEQAEEV 


800 


AQEAKRTMEE 


VWPLKVPLEV 


EVGIGEDWLS 


AKA 


833 



In another embodiment, the invention provides a polypeptide of SEQ ID NO:23, 
a derivative Thermus scotoductus polypeptide from strain SM3 with reduced 
1 5 bias against ddNTP incorporation. SEQ ID NO:23 has Tyr in place of Phe at 
position 668. The sequence of SEQ ID NO:23 is below. 



MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 4 0 

VQAVYGFAKS LLKALREDGD WIWFDAKA PS FRHQTYEA 80 

20 YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 120 

VLATLAKKAE KEGYEVRILT ADRDLYQLLS DRISILHPEG 160 

YLITPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 200 

EKTAAKL I RE WGSLENLLKH LEQVKPASVR EKILSHMEDL 24 0 

KLSLELSRVH TELPLQVDFA RRREPDREGL KAFLERLEFG 280 

25 SLLHEFGLLE SPVAAEEAPW PPPEGAFVGY VLSRPEPMWA 32 0 

ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVL 360 

ALREGIALAQ GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 400 

EEAGERALLS ERLYAALLER LKGEERLLWL YEEVEKPLSR 440 

VLAHMEATGV WLDVAYLKAL SLEVEAELRR LEEEVHRLAG 480 

30 HPFNLNSRDQ LERVLFDELG LPAIGKTEKT GKRSTSAAVL 520 

EALREAHPIV DRILQYRELS KLKGTYIDPL PALVHPKTNR 560 

LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQRIRRAFV 600 

AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 640 

TQTASWMFGV PPEAVDSLMR RAAKTINYGV LYGMSAHRLS 680 

35 GELAI PYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

VETLFGRRRY VPDLASRVKS I REAAERMAF NMPVQGTAAD 760 

LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 

AQEAKRTMEE VWPLKVPLEV EVGIGEDWLS AKA 833 

40 In another embodiment, the invention provides a polypeptide of SEQ ID NO:24, 
a derivative Thermus scotoductus polypeptide from strain Vi7a with reduced 
bias against ddNTP incorporation. SEQ ID NO:24 has Tyr in place of Phe at 
position 668. 
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MRAMLPLFEP KGRVLLVDGH 
VQAVYGFAKS LLKALREDGD 
YKAGRAPTPE DFPRQLALIK 
5 VLATLAKKAE KEGYEVRILT 
YLITPEWLWE KYGLKPSQWV 
EKTAAKL IRE WGSLENLLKH 
KLSLELSRVH TELPLQVDFA 
SLLHEFGLLE SPVAAEEAPW 

10 ELNALAAAWE GRVYRAEDPL 
ALREGIALAP GDDPMLLAYL 
EEAGERALLS ERLYAALLER 
VLAHMEATGV WLDVAYLKAL 
HPFNLNSRDQ LERVLFDELG 

15 EALREAHPIV DRILQYRELS 
LHTRFNQTAT ATGRLSSSDP 
AEEGWRLWL DYSQIELRVL 
TQTASWMFGV PPEAVDS LMR 
GELAI PYEEA VAFIERYFQS 

20 VETLFGRRRY VPDLASRVKS 
LMKLAMVKLF PRLQELGARM 
AQEAKRTMEE VWPLKVPLEV 
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vt VPTVTnnT 
iVxjKvj 1Y1 DPLi 


PALVHPKTNR 


560 


NLQNIPVRTP 


LGQRIRRAFV 


600 


AHLS GDENL I 


RVFQEGQDIH 


640 


RAAKTINYGV 


LYGMSAHRLS 


~S80 


YPKVRAWIEK 


TLAEGRERGY 


720 


IREAAERMAF 


NMPVQGTAAD 


760 


LLQVHDELVL 


EAPKEQAEEV 


800 


EVGIGEDWLS 


AKA 


833 



In another embodiment, the invention provides a polypeptide of SEQ ID 
25 NO:25, a derivative Thermus scotoductus polypeptide from strain X-l with 
reduced 5'-3' exonuclease activity and reduced bias against ddNTP 
incorporation. SEQ ID NO:25 has Asp in place of Gly at position 46 and Tyr in 
place of Phe at position 668. The sequence of SEQ ID NO:25 is below. 



MRAMLPLFEP 


KGRVLLVDGH 


HLAYRTFFAL 


KGLTTSRGEP 


40 


VQAVYDFAKS 


LLKALREDGD 


WIWFDAKA 


PSFRHQTYEA 


80 


YKAGRAPTPE 


DFPRQLALIK 


EMVDLLGLER 


LEVPGFEADD 


120 


VLATLAKKAE 


KEGYEVRILT 


ADRDLYQLLS 


ERISILHPEG 


160 


YLITPEWLWE 


KYGLKPSQWV 


DYRALAGDPS 


DNIPGVKGIG 


200 


EKTAAKLIRE 


WGSLENLLKH 


LEQVKPASVR 


EKI LSHMEDL 


240 


KLSLELSRVR 


TDLPLQVDFA 


RRREPDREGL 


KAFLERLEFG 


280 


SLLHEFGLLE 


SPVAAEEAPW 


PPPEGAFVGY 


VLSRPEPMWA 


320 


ELNALAAAWE 


GRVYRAEDPL 


EALRGLGEVR 


GLLAKDLAVL 


360 


ALREGIALAP 


GDDPMLLAYL 


LDPSNTAPEG 


VARRYGGEWT 


400 


EEAGERALLS 


ERLYAALLER 


LKGEERLLWL 


YEEVEKPLSR 


440 


VLAHMEATGV 


RLDVAYLKAL 


SLEVEAELRR 


LEEEVHRLAG 


480 


HPFNLNSRDQ 


LERVLFDELG 


LPAIGKTEKT 


GKRSTSAAVL 


520 
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EALREAHPIV 


DRILQYRELS 


KLKGTYIDPL 


PALVHPKTNR 


560 


LHTRFNQTAT 


ATGRLSSSDP 


NLQNIPVRTP 


LGQRIRRAFV 


600 


AEEGWRLWL 


DYSQIELRVL 


AHLSGDENLI 


RVFQEGQDIH 


640 


TQTASWMFGV 


PPEAVDSLMR 


RAAKT I NYG V 


LYGMSAHRLS 


680 


GELAIPYEEA 


VAFIERYFQS 


YPKVRAWIEK 


TLAEGRERGY 


720 


VETLFGRRRY 


VPDLASRVKS 


IREAAERMAF 


NMPVQGTAAD 


760 


LMKLAMVKLF 


PRLQELGARM 


LLQVHDELVL 


EAPKEQAEEV 


800 


AQEAKRTMEE 


VWPLKVPLEV 


EVGIGEDWLS 


AKA 


833 



10 In another embodiment, the invention provides a polypeptide of SEQ ID 

NO:26 a derivative Thermus scotoductus polypeptide from strain X-l with 
reduced 5-3' exonuclease activity and reduced bias against ddNTP 
incorporation. SEQ ID NO:26 has Asp in place of Gly at position 46 and Tyr in 
place of Phe at position 668. The sequence of SEQ ID NO:26 is below. 

15 



MLPLFEP 


KGRVLLVDGH 


HLAYRTFFAL 


KGLTTSRGEP 


40 


VQAVYDFAKS 


LLKALREDGD 


WIWFDAKA 


PSFRHQTYEA 


80 


YKAGRAPTPE 


DFPRQLALIK 


EMVDLLGLER 


LEVPGFEADD 


120 


VLATLAKKAE 


KEGYEVRILT 


ADRDLYQLLS 


ERISILHPEG 


160 


YLITPEWLWE 


KYGLKP SQWV 


DYRALAGDPS 


DNIPGVKGIG 


200 


EKTAAKL I RE 


WGSLENLLKH 


LEQVKPASVR 


EKI LSHMEDL 


240 


KLSLELSRVR 


TDLPLQVDFA 


RRREPDREGL 


KAFLERLEFG 


280 


SLLHEFGLLE 


SPVAAEEAPW 


PPPEGAFVGY 


VLSRPEPMWA 


320 


ELNALAAAWE 


GRVYRAEDPL 


EALRGLGEVR 


GLLAKDLAVL 


360 


ALREGIALAP 


GDDPMLLAYL 


LDP SNTAPEG 


VARRYGGEWT 


400 


EEAGERALLS 


ERLYAALLER 


LKGEERLLWL 


YEEVEKPLSR 


440 


VLAHMEATGV 


RLDVAYLKAL 


SLEVEAELRR 


LEEEVHRLAG 


480 


HPFNLNSRDQ 


LERVLFDELG 


LPAIGKTEKT 


GKRSTSAAVL 


520 


EALREAHPIV 


DRILQYRELS 


KLKGTYIDPL 


PALVHPKTNR 


560 


LHTRFNQTAT 


ATGRLSSSDP 


NLQNIPVRTP 


LGQRIRRAFV 


600 


AEEGWRLWL 


DYSQIELRVL 


AHLSGDENLI 


RVFQEGQDIH 


640 


TQTASWMFGV 


PPEAVDSLMR 


RAAKT INYGV 


LYGMSAHRLS 


680 


GELAIPYEEA 


VAFIERYFQS 


YPKVRAWIEK 


TLAEGRERGY 


720 


VETLFGRRRY 


VPDLASRVKS 


IREAAERMAF 


NMPVQGTAAD 


760 


LMKLAMVKLF 


PRLQELGARM 


LLQVHDELVL 


EAPKEQAEEV 


800 


AQEAKRTMEE 


VWPLKVPLEV 


EVGIGEDWLS 


AKA 


833 
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In another embodiment, the invention provides a polypeptide of SEQ ID 
NO:27 a derivative Thermus scotoductus polypeptide from strain SM3 with 
reduced 5'-3' exonuclease activity and reduced bias against ddNTP 
incorporation. SEQ ID NO:27 has Asp in place of Gly at position 46 and Tyr in 
5 place of Phe at position 668. The sequence of SEQ ID NO:27 is below. 

MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 40 

VQAVYDFAKS LLKALREDGD WIWFDAKA PSFRHQTYEA 80 

YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 120 

10 VLATLAKKAE KEGYEVRILT ADRDLYQLLS DRISILHPEG 160 

YLITPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 200 

EKTAAKL I RE WGSLENLLKH LEQVKPASVR EKILSHMEDL 240 

KLSLELSRVH TELPLQVDFA RRREPDREGL KAELERLEFG 280 

SLLHEFGLLE SPVAAEEAPW PPPEGAFVGY VLSRPEPMWA 320 

15 ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVL 360 

ALREGIALAQ GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 400 

EEAGERALLS ERLYAALLER LKGEERLLWL YEEVEKPLSR 440 

VLAHMEATGV WLDVAYLKAL SLEVEAELRR LEEEVHRLAG 480 

HPFNLNSRDQ LERVLFDELG LPAIGKTEKT GKRSTSAAVL 520 

20 EALREAHPIV DRILQYRELS KLKGTYIDPL PALVHPKTNR 560 

LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQRIRRAFV 600 

AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 640 

TQTASWMFGV PPEAVDSLMR RAAKTINYGV LYGMSAHRLS 680 

GELAI PYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

25 VETLFGRRRY VPDLASRVKS IREAAERMAF NMPVQGTAAD 760 

LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 

AQEAKRTMEE VWPLKVPLEV EVGIGEDWLS AKA 833 

In another embodiment, the invention provides a polypeptide of SEQ ID 
30 NO:28 a derivative Thermus scotoductus polypeptide from strain Vi7a with 
reduced 5-3* exonuclease activity and reduced bias against ddNTP 
incorporation. SEQ ID NO:28 has Asp in place of Gly at position 46 and Tyr in 
place of Phe at position 46 and 668. The sequence of SEQ ID NO:28 is below. 

35 MRAMLPLFEP KGRVLLVDGH HLAYRTFFAL KGLTTSRGEP 40 

VQAVYDFAKS LLKALREDGD WIWFDAKA PSFRHQTYEA 80 

YKAGRAPTPE DFPRQLALIK EMVDLLGLER LEVPGFEADD 120 

VLATLAKKAE KEGYEVRILT ADRDLYQLLS DRISILHPEG 160 

YLITPEWLWE KYGLKPSQWV DYRALAGDPS DNIPGVKGIG 200 

40 EKTAAKLI RE WGSLENLLKH LEQVKPASVR EKILSHMEDL 240 

KLSLELSRVH TELPLQVDFA RRREPDREGL KAFLERLEFG 280 

SLLHEFGLLE SPVAAEEAPW PPPEGAFVGY VLSRPEPMWA 320 

ELNALAAAWE GRVYRAEDPL EALRGLGEVR GLLAKDLAVL 360 

ALREGIALAP GDDPMLLAYL LDPSNTAPEG VARRYGGEWT 400 

45 EEAGERALLS ERLYAALLER LKGEERLLWL YEEVEKPLSR 440 

VLAHMEATGV WLDVAYLKAL SLEVEAELRR LEEEVHRLAG 480 
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HPFNLNSRDQ LERVLFDELG LPAIGKTEKT GKRSTSAAVL 520 

EALREAHPIV DRILQYRELS KLKGTYIDPL PALVHPKTNR 560 

LHTRFNQTAT ATGRLSSSDP NLQNIPVRTP LGQRIRRAFV 600 

AEEGWRLWL DYSQIELRVL AHLSGDENLI RVFQEGQDIH 640 

TQTASWMFGV PPEAVDSLMR RAAKT I NYG V LYGMSAHRLS 680 

GELAI PYEEA VAFIERYFQS YPKVRAWIEK TLAEGRERGY 720 

VETLFGRRRY VPDLASRVKS IREAAERMAF NMPVQGTAAD 760 

LMKLAMVKLF PRLQELGARM LLQVHDELVL EAPKEQAEEV 800 

AQEAKRTMEE VWPLKVPLEV EVGIGEDWLS AKA 833 



The nucleic acid polymerase polypeptides of the invention have 
homology to portions of the amino acid sequences of the thermostable DNA 
polymerases of Thermus aquaticus and Thermus thermophilus (see Figure 1). 
However, significant portions of the amino acid sequences of the present 

1 5 invention are distinct, including SEQ ID NO:39-44. 

As indicated above, derivative and variant polypeptides of the invention 
are derived from the wild type nucleic acid polymerase by deletion or addition of 
one or more amino acids to the N-terminal and/or C-terminal end of the wild 
type polypeptide; deletion or addition of one or more amino acids at one or more 

20 sites within the wild type polypeptide; or substitution of one or more amino 
acids at one or more sites within the wild type polypeptide. Thus, the 
polypeptides of the invention may be altered in various ways including amino 
acid substitutions, deletions, truncations, and insertions. 

Such variant and derivative polypeptides may result, for example, from 

25 genetic polymorphism or from human manipulation. Methods for such 

manipulations are generally known in the art. For example, amino acid sequence 
variants of the polypeptides can be prepared by mutations in the DNA. Methods 
for mutagenesis and nucleotide sequence alterations are well known in the art. 
See, for example, Kunkel, Proc. Natl. Acad. Sci. USA, 82, 488 (1985); Kunkel et 

30 ah, Methods in Enzymol., 154, 367 (1987); U. S. Patent No. 4,873,192; Walker 
and Gaastra, eds., Techniques in Molecular Biology, MacMillan Publishing 
Company, New York (1983) and the references cited therein. Guidance as to 
appropriate amino acid substitutions that do not affect biological activity of the 
protein of interest may be found in the model of DayhofF et al., Atlas of Protein 

35 Sequence and Structure, Natl. Biomed. Res. Found., Washington, CD. (1978), 
herein incorporated by reference. 

The derivatives and variants of the isolated polypeptides of the invention 
have identity with at least about 92% of the amino acid positions of any one of 
SEQ ID NO: 13-28 and have nucleic acid polymerase activity and/or are 

40 thermally stable. In a preferred embodiment, polypeptide derivatives and 
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variants have identity with at least about 95% of the amino acid positions of any 
one of SEQ ED NO: 13-28 and have nucleic acid polymerase activity and/or are 
thermally stable. In a more preferred embodiment, polypeptide derivatives and 
variants have identity with at least about 98% of the amino acid positions of any 
5 one of SEQ ID NO: 13-28 and have nucleic acid polymerase activity and/or are 
thermally stable. 

Amino acid residues of the isolated polypeptides and polypeptide 
derivatives and variants can be genetically encoded L-amino acids, naturally 
occurring non-genetically encoded L-amino acids, synthetic L-amino acids or D- 
10 enantiomers of any of the above. The amino acid notations used herein for the 
twenty genetically encoded L-amino acids and common non-encoded amino 
acids are conventional and are as shown in Table 2. 



Table 2 



Amino Acid 


One-Letter 
Symbol 


Common 
Abbreviation 


Alanine 


A 


Ala 


Arainine 


R 

XX. 


A TO 


Asnaratrine 


N 


A<!T1 
ruii 


Aspartic acid 


D 


Asp 


Cysteine 


C 


Cys 


Glutamine 


Q 


Gin 


Glutamic acid 


E 


Glu 


Glycine 


G 


Gly ! 


Histidine 


H 


His 


Isoleucine 


I * 


lie 


Leucine 


L 


Leu 


Lysine 


K 


Lys 


Methionine 


M 


Met 


Phenylalanine 


F 


Phe 


Proline 


P 


Pro 


Serine 


S 


Ser 


Threonine 


T 


Thr 


Tryptophan 


W 


Trp 


Tyrosine 


Y 


Tyr 
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Valine 


V 


Val 


P-Alanine 




Bala 


2,3-Diaminopropionic acid 




Dpr 


a-Aminoisobutyric acid 




Aib 


N-Methylglycine (sarcosine) 




MeGly 


Ornithine 




Om 


Citrulline 




Cit 


t-Butylalanine 




t-BuA 


t-Butylglycine 




t-BuG 


N-methylisoleucine 




Melle 


Phenylglycine 




Phg 


Cyclohexylalanine 




Cha 


Norleucine 




Nle 


Naphthylalanine 




Nal 


Pyridylalanine 






3-Benzothienyl alanine 






4-Chlorophenylalanine 




Phe(4-Cl) 


2-Fluorophenylalanine 




Phe(2-F) 


3-Fluorophenylalanine 




Phe(3-F) 


4-Fluorophenylalanine 




Phe(4-F) 


Penicillamine 




Pen 


1,2,3,4-Tetrahydro- 

isoquinoline-3-carboxylic 

acid 




Tic 


p-2-thienylalanine 




Thi 


Methionine sulfoxide 




MSO 


Homoarginine 




Harg 


N-acetyl lysine 




AcLys 


2,4-Diamino butyric acid 




Dbu 


p-Aminophenylalanine 




Phe(pNH 2 ) 


N-methylvaline 




MeVal 


Homocysteine 




Hcys 


Homoserine 




Hser 
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€-Amino hexanoic acid 




Aha 


6- Amino valeric acid 




Ava 


2,3-Diaminobutyric acid 




Dab 



Polypeptide variants that are encompassed within the scope of the 
invention can have one or more amino acids substituted with an amino acid of 
similar chemical and/or physical properties, so long as these variant polypeptides 
retain polymerase activity and/or remain thermally stable. Derivative 
polypeptides can have one or more amino acids substituted with amino acids 
having different chemical and/or physical properties, so long as these variant 
polypeptides retain polymerase activity and/or remain thermally stable. 

Amino acids that are substitutable for each other in the present variant 
polypeptides generally reside within similar classes or subclasses. As known to 
one of skill in the art, amino acids can be placed into three main classes: 
hydrophilic amino acids, hydrophobic amino acids and cysteine-like amino 
acids, depending primarily on the characteristics of the amino acid side chain. 
" These main classes may be further divided into subclasses. Hydrophilic amino 
acids include amino acids having acidic, basic or polar side chains and 
hydrophobic amino acids include amino acids having aromatic or apolar side 
chains. Apolar amino acids may be further subdivided to include, among others, 
aliphatic amino acids. The definitions of the classes of amino acids as used 
herein are as follows: 

'Hydrophobic Amino Acid" refers to an amino acid having a side chain 
that is uncharged at physiological pH and that is repelled by aqueous solution. 
Examples of genetically encoded hydrophobic amino acids include He, Leu and 
Val. Examples of non-genetically encoded hydrophobic amino acids include t- 
BuA. 

"Aromatic Amino Acid" refers to a hydrophobic amino acid having a 
side chain containing at least one ring having a conjugated 7c-electron system 
(aromatic group). The aromatic group may be further substituted with substituent 
groups such as alkyl, alkenyl, alkynyl, hydroxyl, sulfonyl, nitro and amino 
groups, as well as others. Examples of genetically encoded aromatic amino acids 
include phenylalanine, tyrosine and tryptophan. Commonly encountered non- 
genetically encoded aromatic amino acids include phenylglycine, 2- 
naphthylalanine, P-2-thienylalanine, 1 ,2,3,4-tetrahydroisoquinoIine-3-carboxylic 
acid, 4-chlorophenylalanine, 2-fluorophenylalanine, 3-fluorophenylalanine and 
4-fluorophenylalanine. 
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"Apolar Amino Acid" refers to a hydrophobic amino acid having a side 
chain that is generally uncharged at physiological pH and that is not polar. 
Examples of genetically encoded apofar amino acids include glycine, proline and 
methionine. Examples of non-encoded apolar amino acids include Cha. 
5 "Aliphatic Amino Acid" refers to an apolar amino acid having a saturated 

or unsaturated straight chain, branched or cyclic hydrocarbon side chain. 
Examples of genetically encoded aliphatic amino acids include Ala, Leu, Val 
and lie. Examples of non-encoded aliphatic amino acids include Nle. 

"Hydrophilic Amino Acid" refers to an amino acid having a side chain 
10 that is attracted by aqueous solution. Examples of genetically encoded 
hydrophilic amino acids include Ser and Lys. Examples of non-encoded 
hydrophilic amino acids include Cit and hCys. 

"Acidic Amino Acid" refers to a hydrophilic amino acid having a side 
chain pK value of less than 7. Acidic amino acids typically have negatively 
15 charged side chains at physiological pH due to loss of a hydrogen ion. Examples 
of genetically encoded acidic amino acids include aspartic acid (aspartate) and 
glutamic acid (glutamate). 

"Basic Amino Acid" refers to a hydrophilic amino acid having a side 
chain pK value of greater than 7. Basic amino acids typically have positively 
20 charged side chains at physiological pH due to association with hydronium ion. 
Examples of genetically encoded basic amino acids include arginine, lysine and 
histidine. Examples of non-genetically encoded basic amino acids include the 
non-cyclic amino acids ornithine, 2,3-diaminopropionic acid, 2,4-diaminobutyric 
acid and homoarginine. 
25 , "Polar Amino Acid" refers to a hydrophilic amino acid having a side 

chain that is uncharged at physiological pH, but which has a bond in which the 
pair of electrons shared in common by two atoms is held more closely by one of 
the atoms. Examples of genetically encoded polar amino acids include 
asparagine and glutamine. Examples of non-genetically encoded polar amino 
30 acids include citrulline, N-acetyl lysine and methionine sulfoxide. 

"Cysteine-Like Amino Acid" refers to an amino acid having a side chain 
capable of forming a covalent linkage with a side chain of another amino acid 
residue, such as a disulfide linkage. Typically, cysteine-like amino acids 
generally have a side chain containing at least one thiol (SH) group. Examples 
35 of genetically encoded cysteine-like amino acids include cysteine. Examples of 
non-genetically encoded cysteine-like amino acids include homocysteine and 
penicillamine. 



As will be appreciated by those having skill in the art, the above 
classifications are not absolute. Several amino acids exhibit more than one 
characteristic property, and can therefore be included in more than one category. 
For example, tyrosine has both an aromatic ring and a polar hydroxyl group. 
5 Thus, tyrosine has dual properties and can be included in both the aromatic and 
polar categories. Similarly, in addition to being able to form disulfide linkages, 
cysteine also has apolar character. Thus, while not strictly classified as a 
hydrophobic or apolar amino acid, in many instances cysteine can be used to 
confer hydrophobicity to a polypeptide. 

1 0 Certain commonly encountered amino acids that are not genetically 

encoded and that can be present, or substituted for an amino acid, in the variant 
polypeptides of the invention include, but are not limited to, (3-alanine (b-Ala) 
and other omega-amino acids such as 3-aminopropionic acid (Dap), 2,3- 
diaminopropionic acid (Dpr), 4-aminobutyric acid and so forth; cc- 

15 aminoisobutyric acid (Aib); e-aminohexanoic acid (Aha); 6-aminovaleric acid 
(Ava); N-methylglycine (MeGly); ornithine (Orn); citrulline (Cit); t-butylalanine 
(t-BuA); t-butylglycine (t-BuG); N-methylisoleucine (Melle); phenylglycine 
(Phg); cyclohexylalanine (Cha); norleucine (Nle); 2-naphthylalanine (2-Nal); 4- 
chlorophenylalanine (Phe(4-Cl)); 2-fluorophenylalanine (Phe(2-F)); 3- 

20 fluorophenylalanine (Phe(3-F)); 4-fluorophenylalanihe (Phe(4-F)); penicillamine 
(Pen); l,2,3,4-tetrahydroisoquinoline-3-carboxylic acid (Tic); .beta. -2- 
thienylalanine (Thi); methionine sulfoxide (MSO); homoarginine (hArg); N- 
acetyl lysine (AcLys); 2,3-diaminobutyric acid (Dab); 2,3-diaminobutyric acid 
(Dbu); p-aminophenylalanine (Phe(pNH2)); N-methyl valine (MeVal); 

25 homocysteine (hCys) and homoserine (hSer). These amino acids also fall into 
the categories defined above. 

The classifications of the above-described genetically encoded and non- 
encoded amino acids are summarized in Table 3, below. It is to be understood 
that Table 3 is for illustrative purposes only and does not purport to be an 

30 exhaustive list of amino acid residues that may comprise the variant and 

derivative polypeptides described herein. Other amino acid residues that are 
useful for making the variant and derivative polypeptides described herein can 
be found, e.g., in Fasman, 1989, CRC Practical Handbook of Biochemistry and 
Molecular Biology, CRC Press, Inc., and the references cited therein. Amino 

35 acids not specifically mentioned herein can be conveniently classified into the 
above-described categories on the basis of known behavior and/or their 
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characteristic chemical and/or physical properties as compared with amino acids 
specifically identified. 

TABLE 3 



Classification 


Genetically 
Encoded 


Genetically Non-Encoded j 


Hydrophobic 


F, L, L V 




Aromatic 


F,Y,W 


Phg, Nal, Thi, Tic, Phe(4-Cl), 
Phe(2-F), Phe(3-F), Phe(4-F), 
Pvridvl Ala. Benzothienvl Ala 


Apolar 


M, G,P 




Aliphatic 


A,V,L,I 


t-BuA, t-BuG, Melle, Nle, 
MeVal, Cha, bAla, MeGly, Aib 


Hydrophilic 


S,K 


Cit, hCys 


Acidic 


D,E 




Basic 


H, K, R 


Dpr, Ora, hArg, Phe(p-NH 2 ), 
DBU, A 2 BU 


Polar 


Q,N,S,T,Y 


Cit, AcLys, MSO, hSer 


Cysteine-Like 


C 


Pen, hCys, P-methyl Cys 



5 Polypeptides of the invention can have any amino acid substituted by any 

similarly classified amino acid to create a variant peptide, so long as the peptide 
variant is thermally stable and/or retains DNA Polymerase activity. 

"Domain shuffling" or construction of "thermostable chimeric nucleic 

acid polymerases" may be used to provide thermostable polymerases containing 

10 novel properties. For example, placement of codons 289-422 from the Thermus 

scotoductus DNA polymerase coding sequence after codons 1-288 of the 

Thermus aquations DNA polymerase would yield a novel thermostable nucleic 

acid polymerase containing the 5 f - 3' exonuclease domain of Thermus aquations 

DNA polymerase (1-289), the 3' - 5* exonuclease domain of Thermus 

1 5 scotoductus nucleic acid polymerase (289-422), and the DNA polymerase 

domain of Thermus aquaticus DNA polymerase (423-832). Alternatively, the 5' - 

3' exonuclease domain and the 3' - 5' exonuclease domain of Thermus 

scotoductus nucleic acid polymerase may be fused to the DNA polymerase 

(dNTP binding and primer/template binding domains) portions of Thermus 

20 aquaticus DNA polymerase (about codons 423-832). The donors and recipients 

need not be limited to Thermus aquaticus and Thermus scotoductus 

polymerases. Thermus thermophilus DNA polymerase 3* - 5 1 exonuclease, 5 1 - 3' 
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exonuclease and DNA polymerase domains can similarly be exchanged for those 
in the Thermus scotoductus polymerases of the invention. 

It has been demonstrated that the exonuclease domain of Thermus 
aquaticus Polymerase I can be removed from the amino terminus of the protein 
5 with out a significant loss of thermostability or polymerase activity (Erlich et al., 
(1991) Science 252: 1643-1651, Barnes, W.M., (1992) Gene 112:29-35., Lawyer 
et al., (1989) JBC 264:6427-6437). Other N-terminal deletions similarly have 
been shown to maintain thermostability arid activity (Vainshtein et al., (1996) 
Protein Science 5 : 1 785- 1 792 and references therein.) Therefore this invention 

10 also includes similarly truncated forms of any of the wild type or variant 

polymerases provided herein. For example, the invention is also directed to an 
active truncated variant of any of the polymerases provided by the invention in 
which the first 330 amino acids are removed. 

Moreover, the invention provides SEQ ID NO:45, a truncated form of 

15 a polymerase in which the N-terminal 289 amino acids have been removed from 
the wild type Thermus scotoductus polymerase from strain X-l. 
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E 


SPVAAEEAPW 


300 




PPPEGAFVGY 


VLSRPEPMWA 


ELNALAAAWE 


GRVYRAEDPL 


340 




EALRGLGEVR 


GLLAKDLAVL 


ALREGIALAP 


GDDPMLLAYL 


380 


5 


LDPSNTAPEG 


VARRYGGEWT 


EEAGERALLS 


ERLYAALLER 


420 




LKGEERLLWL 


YEEVEKPLSR 


VLAHMEATGV 


RLDVAYLKAL 


460 




SLEVEAELRR 


LEEEVHRLAG 


HPFNLNSRDQ 


LERVLFDELG 


500 




LPAIGKTEKT 


GKRSTSAAVL 


EALREAHPIV 


DRILQYRELS 


540 




KLKGTYIDPL 


PALVHPKTNR 


LHTRFNQTAT 


ATGRLSSSDP 


580 


10 


NLQNIPVRTP 


LGQRIRRAFV 


AEEGWRLWL 


DYSQIELRVL 


620 




AHLSGDENLI 


RVFQEGQDIH 


TQTASWMFGV 


PPEAVDSLMR 


660 




RAAKT I NFG V 


LYGMSAHRLS 


GELAI PYEEA 


VAFIERYFQS 


700 




YPKVRAWIEK 


TLAEGRERGY 


VETLFGRRRY 


VPDLASRVKS 


740 




IREAAERMAF 


NMPVQGTAAD 


LMKLAMVKLF 


PRLQELGARM 


780 


15 


LLQVHDELVL 


EAPKEQAEEV 


AQEAKRTMEE 


VWPLKVPLEV 


820 




EVGIGEDWLS 


AKA 
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Moreover, the invention provides SEQ ID NO:46 a truncated form of a 

polymerase in which the N-terminal 289 amino acids have been removed from 

20 the wild type Thermus scotoductus polymerase from strain SM3. 

E SPVAAEEAPW 3 00 

PPPEGAFVGY VLSRPEPMWA ELNALAAAWE GRVYRAEDPL 340 

EALRGLGEVR GLLAKDLAVL ALREG I ALAQ GDDPMLLAYL 380 

LDPSNTAPEG VARRYGGEWT EEAGERALLS ERLYAALLER 420 

25 LKGEERLLWL YEEVEKPLSR VLAHMEATGV WLDVAYLKAL 460 

SLEVEAELRR LEEEVHRLAG HPFNLNSRDQ LERVLFDELG 500 

LPAIGKTEKT GKRSTSAAVL EALREAHPIV DRILQYRELS 540 

KLKGTYIDPL PALVHPKTNR LHTRFNQTAT ATGRLSSSDP 580 

NLQNIPVRTP LGQRIRRAFV AEEGWRLWL DYSQIELRVL 620 

30 AHLSGDENLI RVFQEGQDIH TQTASWMFGV PPEAVDSLMR 660 

RAAKT I NFGV LYGMSAHRLS GELAI PYEEA VAFIERYFQS 700 

YPKVRAWIEK TLAEGRERGY VETLFGRRRY VPDLASRVKS 740 

IREAAERMAF NMPVQGTAAD LMKLAMVKLF PRLQELGARM 780 

LLQVHDELVL EAPKEQAEEV AQEAKRTMEE VWPLKVPLEV 820 

35 EVGIGEDWLS AKA 833 

Moreover, the invention provides SEQ ID NO:47 a truncated form of a 

polymerase in which the N-terminal 289 amino acids have been removed from 

the wild type Thermus scotoductus polymerase from strain Vi7a. 

40 E SPVAAEEAPW 300 

PPPEGAFVGY VLSRPEPMWA ELNALAAAWE GRVYRAEDPL 340 



69 



EALRGLGEVR GLLAKDLAVL ALREGIALAP GDDPMLLAYL 380 

LDPSNTAPEG VARRYGGEWT EEAGERALLS ERLYAALLER 420 

LKGEERLLWL YEEVEKPLSR VLAHMEATGV WLDVAYLKAL 460 

SLEVEAELRR LEEEVHRLAG HPFNLNSRDQ LERVLFDELG 500 

LPAIGKTEKT GKRSTSAAVL EALREAHPIV DRILQYRELS 540 

KLKGTYIDPL PALVHPKTNR LHTRFNQTAT ATGRLSSSDP 580 

NLQNIPVRTP LGQRIRRAFV AEEGWRLWL DYSQIELRVL 620 

AHLSGDENLI RVFQEGQDIH TQTASWMFGV PPEAVDSLMR 660 

RAAKTINFGV LYGMSAHRLS GELAIPYEEA VAFIERYFQS 700 

YPKVRAWIEK TLAEGRERGY VETLFGRRRY VPDLASRVKS 74 0 

IREAAERMAF NMPVQGTAAD LMKLAMVKLF PRLQELGARM 780 

LLQVHDELVL EAPKEQAEEV AQEAKRTMEE VWPLKVPLEV 820 

EVGIGEDWLS AKA 833 

Thus, the polypeptides of the invention encompass both naturally 
occurring proteins as well as variations, truncations and modified forms thereof. 
Such variants will continue to possess the desired activity. The deletions, 
insertions, and substitutions of the polypeptide sequence encompassed herein are 
not expected to produce radical changes in the characteristics of the polypeptide. 
One skilled in the art can readily evaluate the thermal stability and polymerase 
activity of the polypeptides and variant polypeptides of the invention by routine 
screening assays. 

Kits and compositions containing the present polypeptides are 
substantially free of cellular material. Such preparations and compositions have 
less than about 30%, 20%, 10%, 5%, (by dry weight) of contaminating bacterial 
cellular protein. 

The activity of nucleic acid polymerase polypeptides and variant 
polypeptides can be assessed by any procedure known to one of skill in the art. 
For example, the DNA synthetic activity of the variant and non-variant 
polymerase polypeptides of the invention can be tested in standard DNA 
sequencing or DNA primer extension reaction. One such assay can be 
performed in a 100 pi (final volume) reaction mixture, containing, for example, 
0.1 mM dCTP, dTTP, dGTP, a- 32 P-dATP, 0.3 mg/ml activated calf thymus 
DNA and 0.5 mg/ml BSA in a buffer containing: 50 mM KC1, 1 mM DTT, 10 
mM MgCh and 50 mM of a buffering compound such as PIPES, Tris or 
Triethylamine. A dilution to 0.1 units/jil of each polymerase enzyme is 
prepared, and 5 ^il of such a dilution is added to the reaction mixture, followed 
by incubation at 60 °C for 10 minutes. Reaction products can be detected by 
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determining the amount of P incorporated into DNA or by observing the 
products after separation on a polyacrylamide gel. 
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Uses for Nucleic Acid Polymerase Polypeptides 

The thermostable enzyme of this invention may be used for any purpose 
in which DNA Polymerase or reverse transcriptase activity is necessary or 
5 desired. For example, the present nucleic acid polymerase polypeptides can be 
used in one or more of the following procedures: DNA sequencing, DNA 
amplification, RNA amplification, reverse transcription, DNA synthesis and/or 
primer extension. The nucleic acid polymerase polypeptides of the invention 
can be used to amplify DNA by polymerase chain reaction (PCR). The nucleic 

10 acid polymerase polypeptides of the invention can be used to sequence DNA by 
Sanger sequencing procedures. The nucleic acid polymerase polypeptides of the 
invention can also be used in primer extension reactions. The nucleic acid 
polymerase polypeptides of the invention can also be used for reverse 
transcription. The nucleic acid polymerase polypeptides of the invention can be 

15 used test for single nucleotide polymorphisms (SNPs) by single nucleotide 

primer extension using terminator nucleotides. Any such procedures and related 
procedures, for example, polynucleotide or primer labeling, minisequencing and 
the like are contemplated for use with the present nucleic acid polymerase 
polypeptides. 

20 Methods of the invention comprise the step of extending a primed 

polynucleotide template with at least one labeled nucleotide, wherein the 
extension is catalyzed by a nucleic acid polymerase of the invention. Nucleic 
acid polymerases used for Sanger sequencing can produce fluorescently labeled 
products that are analyzed on an automated fluorescence-based sequencing 

25 apparatus such as an Applied Biosystems 3 10 or 377 (Applied Biosystems, 

Foster City, Calif.). Detailed protocols for Sanger sequencing are known to those 
skilled in the art and may be found, for example in Sambrook et al, Molecular 
Cloning, A Laboratory Manual, Second Edition, Cold Spring Harbor Press, Cold 
Spring Harbor, N.Y. (1989). 

30 In one embodiment, the nucleic acid polymerase polypeptides of the 

invention are used for DNA amplification. Any procedure that employs a DNA 
polymerase can be used, for example, in polymerase chain reaction (PCR) 
assays, strand displacement amplification and other amplification procedures. 
Strand displacement amplification can be used as described in Walker et al 

35 (1992) Nucl. Acids Res. 20, 1691-1696. The term "polymerase chain reaction" 
("PCR") refers to the method of K. B. Mullis U.S. Pat. Nos. 4,683,195; 
4,683,202; and 4,965,188, hereby incorporated by reference, which describe a 



method for increasing the concentration of a segment of a target sequence in a 
mixture of genomic DNA or other DNA or RNA without cloning or purification. 

The PCR process for amplifying a target sequence consists of introducing 
a large excess of two oligonucleotide primers to the DNA mixture containing the 
5 desired target sequence, followed by a precise sequence of thermal cycling in the 
presence of a nucleic acid polymerase. The two primers are complementary to 
their respective strands of the double stranded target sequence. To effect 
amplification, the mixture is denatured and the primers annealed to their 
complementary sequences within the target molecule. Following annealing, the 
10 primers are extended with a polymerase so as to form a new pair of 

complementary strands. The steps of denaturation, primer annealing and 
polymerase extension can be repeated many times. Each round of denaturation, 
annealing and extension constitutes one "cycle." There can be numerous cycles, 
and the amount of amplified DNA produced increases with the number of cycles. 

15 Hence, to obtain a high concentration of an amplified target nucleic acid, many 
cycles are performed. 

The steps involve in PCR nucleic acid amplification method are 
described in more detail below. For ease of discussion, the nucleic acid to be 
amplified is described as being double-stranded. However, the process is 

20 equally useful for amplifying a single-stranded nucleic acid, such as an mRNA, 
although the ultimate product is generally double-stranded DNA. In the 
amplification of a single-stranded nucleic acid, the first step involves the 
synthesis of a complementary strand (one of the two amplification primers can 
be used for this purpose), and the succeeding steps proceed as follows: 

25 (a) contacting each nucleic acid strand with four different nucleoside 

triphosphates and one oligonucleotide primer for each strand of the specific 
sequence being amplified, wherein each primer is selected to be substantially 
complementary to the different strands of the specific sequence, such that the 
extension product synthesized from one primer, when it is separated from its 

30 complement, can serve as a template for synthesis of the extension product of the 
other primer, such contacting being at a temperature that allows hybridization of 
each primer to a complementary nucleic acid strand; 

(b) contacting each nucleic acid strand; at the same time as or after step 
(a), with a nucleic acid polymerase of the invention that enables combination of 

35 the nucleoside triphosphates to form primer extension products complementary 
to each strand of the specific nucleic acid sequence; 
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(c) maintaining the mixture from step (b) at an effective temperature for 
an effective time to promote the activity of the enzyme and to synthesize, for 
each different sequence being amplified, an extension product of each primer 
that is complementary to each nucleic acid strand template, but not so high as to 

5 separate each extension product from the complementary strand template; 

(d) heating the mixture from step (c) for an effective time and at an 
effective temperature to separate the primer extension products from the 
templates on which they were synthesized to produce single-stranded molecules 
but not so high as to denature irreversibly the enzyme; 

1 0 (e) cooling the mixture from step (d) for an effective time and to an 

effective temperature to promote hybridization of a primer to each of the single- 
stranded molecules produced in step (d); and 

(f) maintaining the mixture from step (e) at an effective temperature for 
an effective time to promote the activity of the enzyme and to synthesize, for 

1 5 each different sequence being amplified, an extension product of each primer 
that is complementary to each nucleic acid template produced in step (d) but not 
so high as to separate each extension product from the complementary strand 
template. The effective times and temperatures in steps (e) and (f) may coincide, 
so that steps (e) and (f) can be carried out simultaneously. Steps (d)-(f) are 

20 repeated until the desired level of amplification is obtained. 

The amplification method is useful not only for producing large amounts 
of a specific nucleic acid sequence of known sequence but also for producing 
nucleic acid sequences that are known to exist but are not completely specified. 
One need know only a sufficient number of bases at both ends of the sequence in 

25 sufficient detail so that two oligonucleotide primers can be prepared that will 
hybridize to different strands of the desired sequence at relative positions along 
the sequence such that an extension product synthesized from one primer, when 
separated from the template (complement), can serve as a template for extension 
of the other primer. The greater the knowledge about the bases at both ends of 

30 the sequence, the greater can be the specificity of the primers for the target 
nucleic acid sequence. 

Thermally stable nucleic acid polymerases are therefore generally used 
for PCR because they can function at the high temperatures used for melting 
double stranded target DNA and annealing the primers during each cycle of the 

35 PCR reaction. High temperature results in thermodynamic conditions that favor 
primer hybridization with the target sequences and not hybridization with non- 
target sequences (H. A. Erlich (ed.), PCR Technology, Stockton Press [1989]). 
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The thermostable nucleic acid polymerases of the present invention 
satisfy the requirements for effective use in amplification reactions such as PCR. 
The present polymerases do not become irreversibly denatured (inactivated) 
when subjected to the required elevated temperatures for the time necessary to 
5 melt double-stranded nucleic acids during the amplification process. Irreversible 
denaturation for purposes herein refers to permanent and complete loss of 
enzymatic activity. The heating conditions necessary for nucleic acid 
denaturation will depend, e.g., on the buffer salt concentration and the 
composition and length of the nucleic acids.being denatured,_but typically 

10 denaturation can be done at temperatures ranging from about 90°C to about 

105°C. The time required for denaturation depends mainly on the temperature 
and the length of the duplex nucleic acid. Typically the time needed for 
denaturation ranges from a few seconds up to four minutes. Higher temperatures 
may be required as the salt concentration of the buffer, or the length and/or GC 

15 composition of the nucleic acid is increased. The nucleic acid polymerases of the 
invention do not become irreversibly denatured for relatively short exposures to 
temperatures of about 90°C to 100°C. 

The thermostable polymerases of the invention have an optimum 
temperature at which they function that is higher than about 45 °C. 

20 Temperatures below 45 °C facilitate hybridization of primer to template, but 
depending on salt composition and concentration and primer composition and 
length, hybridization of primer to template can occur at higher temperatures 
(e.g., 45 °C to 70 °C), which may promote specificity of the primer 
hybridization reaction. The polymerases of the invention exhibit activity over a 

25 broad temperature range from about 37°C to about 90°C. 

The present polymerases have particular utility for PCR not only because 
of their thermal stability but also because of their ability to synthesize DNA 
using an RNA template and because of their fidelity in replicating the template 
nucleic acid. In most PCR reactions that start with an RNA template, reverse 

30 transcriptase must be added. However, use of reverse transcriptase has certain 
drawbacks. First, it is not stable at higher temperatures. Hence, once the initial 
complementary DNA (cDNA) has been made by reverse transcriptase and the 
thermal cycles of PCR are started, the original RNA template is not used as a 
template in the amplification reaction. Second, reverse transcriptase does not 

35 produce a cDNA copy with particularly good sequence fidelity. With PCR, it is 
possible to amplify a single copy of a specific target or template nucleic acid to a 
level detectable by several different methodologies. However, if the sequence of 
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the target nucleic acid is not replicated with fidelity, then the amplified product 
can include a pool of nucleic acids with diverse sequences. Hence, the nucleic 
acid polymerases of the invention that can accurately reverse transcribe RNA 
and replicate the sequence of the template RNA or DNA with high fidelity is 
5 highly desirable. 

Any nucleic acid can act as a "target nucleic acid" for the PCR methods 
of the invention. The term "target," when used in reference to the polymerase 
chain reaction, refers to the region of nucleic acid bounded by the primers used 
for polymerase chain reaction. In addition to genomic DNA and nxRNA, any 

10 cDNA, RNA, oligonucleotide or polynucleotide can be amplified with the 
appropriate set of primer molecules. In particular, the amplified segments 
created by the PCR process itself are, themselves, efficient templates for 
subsequent PCR amplifications. The length of the amplified segment of the 
desired target sequence is determined by the relative positions of the primers 

1 5 with respect to each other, and therefore, this length is readily controlled. 

The amplified target nucleic acid can be detected by any method known 
to one of skill in the art. For example, target nucleic acids are often amplified to 
such an extent that they form a band visible on a size separation gel. Target 
nucleic acids can also be detected by hybridization with a labeled probe; by 

20 incorporation of biotinylated primers during PCR followed by avidin-enzyme 
conjugate detection; by incorporation of 32 P- labeled deoxynucleotide 
triphosphates during PCR, and the like. 

The amount of amplification can also be monitored, for example, by use 
of a reporter-quencher oligonucleotide as described in U.S. Patent 5,723,591, 

25 and a nucleic acid polymerase of the invention that has 5* - 3' nuclease activity. 
The reporter-quencher oligonucleotide has an attached reporter molecule and an 
attached quencher molecule that is capable of quenching the fluorescence of the 
reporter molecule when the two are in proximity. Quenching occurs when the 
reporter-quencher oligonucleotide is not hybridized to a complementary nucleic 

30 acid because the reporter molecule and the quencher molecule tend to be in 
proximity or at an optimal distance for quenching. When hybridized, the 
reporter-quencher oligonucleotide emits more fluorescence than when 
unhybridized because the reporter molecule and the quencher molecule tend to 
be further apart. To monitor amplification, the reporter-quencher 

35 oligonucleotide is designed to hybridize 3' to an amplification primer. During 
amplification, the 5' - 3 f nuclease activity of the polymerase digests the reporter 
oligonucleotide probe, thereby separating the reporter molecule from the 
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quencher molecule. As the amplification is conducted, the fluorescence of the 
reporter molecule increases. Accordingly, the amount of amplification 
performed can be quantified based on the increase of fluorescence observed. 
Oligonucleotides used for PCR primers are usually about 9 to about 75 
5 nucleotides, preferably about 17 to about 50 nucleotides in length. Preferably, 
an oligonucleotide for use in PCR reactions is about 40 or fewer nucleotides in 
length (e.g., 9, 12, 15, 1 8, 20, 21, 24, 27, 30, 35, 40, or any number between 9 
and 40). Generally specific primers are at least about 14 nucleotides in length. 
For optimum specificity and cost effectiveness, primers of 16-24 nucleotides in 

1 0 length are generally preferred. 

Those skilled in the art can readily design primers for use processes such 
as PCR. For example, potential primers for nucleic acid amplification can be 
used as probes to determine whether the primer is selective for a single target 
and what conditions permit hybridization of a primer to a target within a sample 

1 5 or complex mixture of nucleic acids. 

The present invention also contemplates use of the present nucleic acid 
polymerases in combination with other procedures or enzymes. For example, 
the polymerases can be used in combination with additional reverse transcriptase 
or another DNA polymerase. See U.S. Pat. No. 5,322,770, incorporated by 

20 reference herein. 

In another embodiment, nucleic acid polymerases of the invention with 5' 
- 3 1 exonuclease activity are used to detect target nucleic acids in an invader- 
directed cleavage assay. This type of assay is described, for example, in U.S. 
Patent 5,994,069. It is important to note that the 5' - 3' exonuclease of DNA 

25 polymerases is not really an exonuclease that progressively cleaves nucleotides 
from the 5* end of a nucleic acid, but rather a nuclease that can cleave certain 
types of nucleic acid structures to produce oligonucleotide cleavage products. 
Such cleavage is sometimes called structureTspecific cleavage. 

In general, the invader-directed cleavage assay employs at least one pair 

30 of oligonucleotides that interact with a target nucleic acid to form a cleavage 
structure for the 5' - 3' nuclease activity of the nucleic acid polymerase. 
Distinctive cleavage products are released when the cleavage structure is cleaved 
by the 5* - 3* nuclease activity of the polymerase. Formation of such a target- 
dependent cleavage structure and the resulting cleavage products is indicative of 

35 the presence of specific target nucleic acid sequences in the test sample. 

Therefore, in the invader-directed cleavage procedure, the 5* - 3* nuclease 
activity of the present polymerases is needed as well at least one pair of 
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oligonucleotides that interact with a target nucleic acid to form a cleavage 
structure for the 5* - 3* nuclease. The first oligonucleotide, sometimes termed the 
"probe," can hybridize within the target site but downstream of a second 
oligonucleotide, sometimes termed an "invader" oligonucleotide. The invader 
5 oligonucleotide can hybridize adjacent and upstream of the probe 

oligonucleotide. However, the target sites to which the probe and invader 
oligonucleotides hybridize overlap such that the 3 f segment of the invader 
oligonucleotide overlaps with the 5* segment of the probe oligonucleotide. The 
5' - 3 1 nuclease of the present polymerases can cleave the probe oligonucleotide 

10 at an internal site to produce distinctive fragments that are diagnostic of the 

presence of the target nucleic acid in a sample. Further details and methods for 
adapting the invader-directed cleavage assay to particular situations can be found 
in U.S. Patent 5,994,069. 

One or more nucleotide analogs can also be used with the present 

1 5 methods, kits and with the nucleic acid polymerases. Such nucleotide analogs 
can be modified or non-naturally occurring nucleotides such as 7-deaza purines 
(i.e., 7-deaza-dATP and 7-deaza-dGTP). Nucleotide analogs include base 
analogs and comprise modified forms of deoxyribonucleotides as well as 
ribonucleotides. As used herein the term "nucleotide analog" when used in 

20 reference to targets present in a PCR mixture refers to the use of nucleotides 
other than dATP, dGTP, dCTP and dTTP; thus, the use of dUTP (a naturally 
occurring dNTP) in a PCR would comprise the use of a nucleotide analog in the 
PCR. A PCR product generated using dUTP, 7-deaza-dATP, 7-deaza-dGTP or 
any other nucleotide analog in the reaction mixture is said to contain nucleotide 

25 analogs. 

The invention also provides kits that contain at least one of the nucleic 
acid polymerases of the invention. Individual kits maybe adapted for 
performing ope or more of the following procedures: DNA sequencing, DNA 
amplification, RNA Amplification and/or primer extension. Kits of the 

30 invention comprise a DNA polymerase polypeptide of the invention and at least 
one nucleotide. A nucleotide provided in the kits of the invention can be labeled 
or unlabeled. Kits preferably can also contain instructions on how to perform 
the procedures for which the kits are adapted. 

Optionally, the subject kit may further comprise at least one other reagent 

35 required for performing the method the kit is adapted to perform. Examples of 
such additional reagents include: another unlabeled nucleotide, another labeled 
nucleotide, a balance mixture of nucleotides, one or more chain terminating 



nucleotides, one or more nucleotide analogs, buffer solution(s); magnesium 
solution(s), cloning vectors, restriction endonucleases, sequencing primers, 
reverse transcriptase, and DNA or RNA amplification primers. The reagents 
included in the kits of the invention may be supplied in premeasured units so as 
5 to provide for greater precision and accuracy. Typically, kits reagents and other 
components are placed and contained in separate vessels. A reaction vessel, test 
tube, microwell tray, microtiter dish or other container can also be included in 
the kit. Different labels can be used on different reagents so that each reagent 
can be distinguished from another. 
1 0 The following Examples further illustrate the invention and are not 

intended to limit the scope of the invention. 

EXAMPLE 1: Cloning of Thermus scotoductus^ Strain X-l Polymerase 

15 Growth of bacteria and genomic DNA isolation 

Thermus scotoductus (Tsc) strain X-l was obtained from ATCC (ATCC 
Deposit No. 27978). The lyophilized bacteria were revived in ATCC Culture 
Medium 461 (Castenholz TYE medium) and grown overnight to stationary 
phase. Thermus scotoductus genomic DNA was prepared using a Quiagen 
20 genomic DNA preparation protocol and kit (Quiagen). 

Cloning methods 

The first forward and reverse primers were designed by analysis of 5* and 
3' terminal homologous conserved regions of the DNA sequences of Thermus 

25 aquaticus (Taq), Thermus thermophilus (Tth), Thermus filiformis (Tfi), Thermus 
caldophilus (that was determined to actually be Tth strain GK24), and Thermus 
flavus (believed to be Thermus igniterrae). A fragment of a Thermus 
scotoductus polymerase gene was amplified using N-terminal primer 5 - ggc 
cac cac ctg gcc tac -3 f (SEQ ED NO:29) and C-terminal primer 5'- ccc acc tec 

30 acc tec ag -3* <SEQ ID NO:30). The following PCR reaction mixture contained 
2.5nl of lOx Amplitaq buffer (ABi), 2mM MgCl, 60 ng DNA template, 2.5mM 
(each) dNTP, 20pmol of each primer, and 1.25 units of Amplitaq DNA 
polymerase in a 25^1 total reaction volume. 

The reaction mixture was heated to 80°C and then the primers were 

35 added. This was followed by a predenaturation step (96°C for 30 s); PCR 
cycling for 30 cycles (97°C for 3 s, 56°C for 30 s, 72°C for 3 min) with a 
finishing step (72°C for 7 min). This produced an approximate 1.5 kb DNA 
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fragment that was cloned and sequenced. This cloned fragment showed some 
homology to the Tth Polymerase I gene (Genebank accession number 466573) 
between nucleotide numbers 644 and 1973. 

Direct sequencing of the genomic DNA was used to obtain the sequence 
5 of the 5 f terminus of the Thermus scotoductus polymerase gene. The primer used 
was 5'- ctg gcc atg ctg aag etc ttt -3' (SEQ ID NO:31) and a 2-step 
thermocycling protocol. A predenaturation step (95°G for 5 min) was followed 
by 80 cycles (97°C for 5sec, 60°C for 4min). Reaction mixture consisted of 
16^1 Big Dye VI Ready Reaction mix, 2.8 ug DNA, 15 pmol primer in a 40^1 

10 total reaction volume. The sequencing of the Thermus scotoductus gene from 
genomic DNA revealed that the 5' terminal sequence of the wild-type Thermus 
scotoductus gene is 5 - ata agg gcg atg ctg ccc etc ttt gag-5' (SEQ ID NO:32) 
that would indicate that the ATG is the start codon of the wild-type gene. 
However, the N-terminus of Taq, Tth and Tfi enzymes have two methionine 

1 5 amino acid residues at their N terminal end separated by two amino acids. In 
order to make the Thermus scotoductus N-terminus more similar to the other 
known Thermus DNA polymerases, and possibly to improve protein translation 
efficiency, the ATA codon was changed to ATG. This introduced an additional 
start for protein translation making the recombinant protein N terminus MRAM. 

20 The amplification of the full-length Thermus scotoductus nucleic acid 

polymerase coding region was carried out using the 5' forward primer 5' - cat atg 
agg gcg atg ctg ccc etc -3' (SEQ ID NO:33). Another consideration when 
designing this primer was to introduce a recognition site for the restriction 
enzyme Nde I (catatg, SEQ ID NO:34). This sequence was introduced to 

25 facilitate subcloning of the coding region into other plasmid vectors. 

As described above, the first cloned portion of the Thermus scotoductus, 
strain X-l polymerase gene was only 1.2 kb. This represented approximately 
half of the full-length gene. In order to obtain a larger fragment of the Thermus 
scotoductus gene, a PCR reaction was carried out using the 5' forward primer 

30 (SEQ ID NO:33) described in the previous paragraph and a new primer designed 
near the same homologous 3' region of the known Thermus polymerase genes. 
The sequence of this primer was 5 - etc cac etc cag ggg cac -3' (SEQ ID NO:35). 
The PCR reaction was the same mixture as above. The cycling conditions were 
altered slightly in order to promote greater specificity. The reaction mixture was 

35 heated to 80°C and then the primers were added. This was followed by a 

predenaturation step (96°C for 2 min); PCR cycling for 10 cycles (97°C for 10 s, 
70°C for 3 min), 25 cycles (97°C for 10 s, 60°C for 3 min), with a finishing step 



(72°C for 7 min). This produced a 2.4kb fragment that was cloned and 
sequenced. This left to be sequenced a short 3- terminal region of the Thermus 
scotoductus, strain X-l polymerase gene. 

Based on the additional sequence of the larger fragment of the Thermus 
5 scotoductus polymerase gene, a new primer was designed to obtain the 
remaining unknown 3* sequence: 5'- ctg gcc atg gtg aag etc ttt -3' (SEQ ID 
NO:36). The genomic sequencing protocol was the same as described for the 
previous genomic DNA sequencing reaction for the 5' terminus. Once the 
sequence was obtained, a primer was designed to be used with the 5 ! terminal 

1 0 primer described above to amplify the full length Thermus scotoductus 

polymerase gene. This primer is complementary to the 3' terminal sequence. It 
also has a Sal I recognition site (gtcgac, SEQ ID NO:37) overlapping with the 
stop codon. This restriction site will facilitate subcloning into other plasmid 
DNA vectors. The sequence of the primer is 5'-gtc gac tag gcc ttg gcg aaa gcc a 

15 -3 1 (SEQ ID NO:38). 

Three different cloned Thermus scotoductus polymerase genes were 
sequenced independently in order to rule out PCR errors. The resulting 
consensus sequence is the natural Thermus scotoductus polymerase gene 
sequence of this invention (SEQ ID NO: 14). The amino acid numbering used in 

20 this description of the invention is based on a recombinant form of the Thermus 
scotoductus polymerase protein that has an additional three amino acids at its N- 
terminus (SEQ ID NO: 1 3). However, SEQ ID NO: 1 4 is the sequence for the 
wild type Thermus scotoductus polymerase from strain X-l . 

The amino acid sequence of the strain X-l Thermus scotoductus 

25 polymerase has several differences when compared with the amino acid 
sequence of Thermus aquaticus DNA Polymerase, including about 5 1 
conservative amino acid changes and about 62 nonconservative amino acid 
changes. For example, one region of dissimilarity is between amino acid 
positions at approximately 5 1 and about 65, where the sequence of the Thermus 

30 scotoductus polymerase has about four amino acid changes (in bold): 

LLKALREDG DWIVVFDAK APSFRHQTYE (SEQ ED NO:39). Another 
region of dissimilarity is between amino acid positions at approximately 201 and 
about 236, where the sequence of the Thermus scotoductus polymerase has about 
seven amino acid changes (in bold): GEKTAAKLIREWGSLENLLKHLEQV 

35 KPASV REKILS (SEQ ID NO:40). Another region of dissimilarity is between 
amino acid positions at approximately 31 1 and about 350, where the sequence of 
the Thermus scotoductus polymerase has about seven amino acid changes (in 
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bold): VGYVLSRPEPMWAELN ALAAAWEGRVYRAEDPLEALRGLG 
(SEQ ID NO:41). Another region of dissimilarity is between amino acid 
positions at approximately 415 and about 435, where the sequence of the 
Thermus scotoductus polymerase has about five amino acid changes (in bold): 
5 RLYAALLERLKGEERLLWLYE (SEQ ID NO:42). Another region of 
dissimilarity is between amino acid positions at approximately 531 and about 
562, where the sequence of the Thermus scotoductus polymerase has about six 
amino acid changes (in bold): PIVDRILQYRELSKLK GTYID 
PLPALVHPKTN (SEQ ID NO:43). Another region of dissimilarity is between 
10 amino acid positions at approximately 801 and about 836, where the sequence of 
the Thermus scotoductus polymerase has about eight amino acid changes (in 
bold): EEVAQEAKRT MEEVWPLKVPLEVEVGIGEDWLS AKA (SEQ ID 
NO:44). Hence, many regions of the Thermus scotoductus polymerase differ 
from the Thermus aquaticus and Thermus thermophilus DNA Polymerases. 

15 

Modification of Strain X-l Polymerase Wild-Type Gene 

In order to produce Thermus scotoductus polymerase in a form suitable 
for dye-terminator DNA sequencing, two amino acid substitutions were made. 
These are the FS (Tabor and Richardson, 1995 PNAS 92: 6339-6343) and exo- 

20 minus (G46D mutation) mutations. To reduce the exonuclease activity to very 
low levels, the mutation G46D was introduced. To reduce the discrimination 
between ddNTP's and dNTP's, the mutation F666Y was introduced. 

Mutagenesis was carried out using the modified QuickGhange™ 
(Stratagene) PCR mutagenesis protocol described in Sawano & Miyawaki 

25 (2000), Nucleic Acids Research Vol. 28. The mutated gene was resequenced 
completely to confirm the introduction of the mutations and to ensure that no 
PCR errors were introduced. 

The Thermus scotoductus, strain X-l, polymerase gene (FS, exo~ ) was 
removed from the cloning vector by restriction digest with Ndel and Sail. The 

30 2.4kb gene was ligated into the pT7 expression vector (Brookhaven National 
Laboratories, Long Island, NY). This resulting vector containing the Thermus 
scotoductus polymerase (fs, exo~) gene was used to transform BL21 E. coli cells 
(Invitrogen). 

35 EXAMPLE 2: Thermus scotoductus. Strain X-l Polymerase Expression and 

Purification 
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BL21 E. coli cells (Invitrogen) containing the pT7 expression vector with 
the Thermus scotoductus y strain X-l polymerase coding region were grown in 
one liter of Terrific Broth (Maniatis) to an optical density of 1.20D and the 
polymerase protein was overproduced by four-hour induction with 1 .0 mM 
5 IPTG. The cells were harvested by centrifugation, washed in 50 mM Tris (pH 
7.5), 5mM EDTA, 5% glycerol, lOmM EDTA to remove growth media, and the 
cell pellet frozen at -80°C. 

To isolate the Thermus scotoductus, strain X-l polymerase, the cells 
were thawed and resuspended in 2.5 volumes (wet weight) of 50mM Tris (pH 

1 0 7.2), 400mM NaCl, ImM EDTA. The cell walls were disrupted by sonication 
and the resulting E. coli cell debris were removed by centrifugation. The 
resulting lysate was pasteurized in a water bath (75°C for 45 min), denaturing 
and precipitating the majority of the non-thermostable E. coli proteins and 
leaving the thermostable Thermus scotoductus, strain X-l polymerase in 

1 5 solution. E. coli genomic DNA was removed by coprecipitation with 0.3% 
Polyethyleneimine (PEI). The cleared lysate is then applied to two columns in 
series: (1) a Biorex 70 cation exchange resin that chelates excess PEI and (2) a 
heparin-agarose column (dimensions to be provided) that retains the polymerase. 
This column is washed with 5 column volumes of 20mM Tris (pH 8.5), 

20 5%glycerol, lOOmM NaCl, O.lmM EDTA, 0.05% Triton X-100 and 0.05% 

Tween-20 (KTA). The protein was then eluted with a 0.1 to 1.0M NaCl linear 
gradient. The polymerase eluted at 0.8M NaCl. The eluted Tsc Polymerase was 
concentrated and the buffer exchanged using a Millipore concentration filter 
(30kD M. W. cutoff). The concentrated protein was stored at in KTA (no salt) 

25 plus 50% glycerol at - 20°C. 

The activity of the polymerase was measured using the standard salmon 
sperm DNA radiometric activity assay and sequencing was tested using the Big 
Dye Version 3. The enzyme is active in 40-80mM Tris, 1.0-2.0mM MgCl at a 
dNTP mix consisting of 0.2mM dATP, 0.2mM dCTP, 0.2mM dUTP, and 

30 0.3mM dITP, at pH 8.0-1 0.0, with optimal activity between pH 9.0 and 9.58. 
The enzyme is also active in KC1 concentrations from 0 to lOOmM, indicating 
that the T scotoductus, strain X-l polymerase is more salt-tolerant than either 
Tfil or Taq, but not 
quite as salt-tolerant as Tth. 
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EXAMPLE 3: Thermus scotoductus Strains SM3 and Vi7a 
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The same primers used to amplify the full-length gene encoding the 
polymerase from Thermits scotoductus (Tsc) strain X-l were used to amplify the 
polymerase genes from two additional strains of Thermus scotoductus: strain 
SM3 and strain Vi7a. The PCR reaction mixture used to amplify nucleic acids 
5 encoding the Thermus scotoductus polymerase from strains SM3 and Vi7a 
contained 2.5^1 of lOx Amplitaq reaction buffer (Applied Biosystems), 2 mM 
MgCl 2 , 70 to 100 ng genomic DNA template, 0.2 mM (each) dNTPs, 20 pmol of 
each primer, and 1.25 units of Amplitaq in a 25 |il total reaction volume. The 
reaction was started by adding a premix containing enzyme, MgCb, dNTPs, 
10 buffer and water to another premix containing primer and template preheated at 
80°C. The entire reaction mixture was then denatured (30 sec at 96°C) followed 
by 30 PGR cycles (97°C for 3 sec, 62°C for 30 sec, 72°C for 3 min) with a 
finishing step (72°C for 7 min). 

These PCR reactions each produced approximate 2.5 kb DNA fragments. 
1 5 The amplified fragments were purified from the PCR reaction mixes using a 

Quiagen PCR cleanup kit (Quiagen). The Thermus scotoductus fragments were 
ligated into the inducible expression vector pCR®4-TOPO® (Invitrogen, 
Carlsbad, CA). Three different cloned Thermus scotoductus polymerase genes 
from each strain were sequenced independently in order to rule out PCR errors. 
20 The resulting consensus sequences for the wild-type genes are reported in 
Figures 1 and 3 below. 

There are several silent changes at the DNA level among the three genes. 
Only the changes resulting in a different amino acid are noted in the alignment 
of amino acid sequences provided in Figure 2. The Thermus scotoductus, strain 
25 SM3 polymerase has five positions that have different amino acids compared to 
strain X-l. The Thermus scotoductus strain Vi7a polymerase has four 
differences when compared to the amino acid sequence of the polymerase from 
strain X-l. These are indicated with boldface in Figure 2. 

30 Modification of Polymerases from Strains SM3 and Vi7a 

In order to produce the polymerases from Thermus scotoductus strains 
SM3 and Vi7a in a form suitable for dye-terminator DNA sequencing, two 
amino acid substitutions were made in each gene. These are the FS mutation 
(U.S. Patent 5,614,365; Tabor and Richardson, 1995 PNAS 92: 6339-6343) and 

35 exo-minus mutation (G46D Patent, Joyce papers) that were described in the 

patent application. As described previously, mutagenesis was carried out using 
the modified QuickChange™ (Stratagene) PCR mutagenesis protocol described 



in Sawano & Miyawaki (2000), Nucleic Acids Research Vol. 28. The mutated 
genes was resequenced completely to confirm the introduction of the mutations 
and to ensure that no PCR errors were introduced. 



5 Protein expression and purification 

The "FS, exo-minus form of both Thermus scotoductus polymerase genes 
were subcloned into the pet expression vector using the Ndel and Sal I restriction 
sites. BL21 cells (Invitrogen) were transformed with this expression construct. 
The cells were grown in one liter of Terrific Broth (Maniatis) to an optical 

10 density of 1 .20D and the proteins were overproduced by four-hour induction 
with 1.0 mM IPTG. The cells were harvested by centrifugation, washed in 50 
mM Tris (pH 7.5), 5mM EDTA, 5% glycerol, lOmM EDTA to remove growth 
media, and the cell pellet frozen at -80°C. 

To isolate the Thermus scotoductus, strain SM3 and Vi7a polymerases, 

1 5 the cells were thawed and resuspended in 2.5 volumes (wet weight) of 50mM 
Tris (pH 7.2), 400mM NaCl, ImM EDTA. The cell walls were disrupted by 
sonication and the resulting E. coli cell debris was removed by centrifugation. 
The resulting lysate was pasteurized in a water bath (75°C for 45 min), 
denaturing and precipitating the majority of the non-thermostable E. coli proteins 

20 and leaving the thermostable Thermus scotoductus polymerase in solution. E, 
coli genomic DNA was removed by coprecipitation with 0.3% 
Polyethyleneimine (PEI). The cleared lysate was then applied to two columns in 
series: (1) a Biorex 70 cation exchange resin that chelates excess PEI and (2) a 
heparin-agarose column that retains the polymerase. This column was washed 

25 with 5 column volumes of 20mM Tris (pH 8.5), 5%glycerol, lOOmM NaCl, 

0.1 mM EDTA, 0.05% Triton X-100 and 0.05% Tween-20 (KTA). The proteins 
were then eluted with a 0.1 to 1.0M NaCl linear gradient. The polymerases 
eluted at 0.8M NaCl. The eluted Thermus scotoductus polymerases were 
concentrated and the buffer exchanged using a Millipore concentration filter 

30 (30kD M.W. cutoff). The concentrated proteins were stored at in KTA (no salt) 
plus 50% glycerol at - 20°C. 

The activity of the polymerases were measured using a nicked salmon 
sperm DNA radiometric activity assay. Both enzymes are being tested for use in 
sequencing using the Big Dye™ V 3.0. The enzymes are active in 40-80mM 

35 Tris, 1.0-2.0mM MgCl at a dNTP mix consisting of 0.2mM dATP, 0.2mM 
dCTP, 0.2mM dUTP, and 0.3mM dITP, at pH 8.0-10.0, with optimal activity 
between pH 9.0 and 9.58. 




All publications and patents mentioned in the above specification are 
herein incorporated by reference. Various modifications and variations of the 
described method and system of the invention will be apparent to those skilled in 
5 the art without departing from the scope and spirit of the invention. Although 
the invention has been described in connection with specific preferred 
embodiments, it should be understood that the invention as claimed should not 
be unduly limited to such specific embodiments. Indeed, various modifications 
of the described modes for carrying out the invention that are obvious to those 
10 skilled in the relevant arts are intended to be within the scope of the following 
claims. 
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